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Preface 


This book provides a fundamental introduction to numerical analysis suitable for un 
dergraduate students in mathematics, computer science, physical sciences, and engi¬ 
neering. It is assumed that the reader is familiar with calculus and has taken a struc¬ 
tured programming course. The text has enough material fitted modularly for either a 
single-term course or a year sequence, in short, the book contains enough material so 
instructors will be able to select topics appropriate to their needs. 

Students of various backgrounds should find numerical methods quite interesting 
and useful, and tins is kept in mind throughout the book. Thus, there is a wide vari¬ 
ety of examples and problems that help to sharpen one’s skill in both the theory and 
practice of numerical analysis. Computer calculations are presented in the form of ta¬ 
bles and graphs whenever possible so that the resulting numerical approximations arc 
easier to visualize and interpret. MATLAB programs are the vehicle for presenting the 
underlying numerical algorithms. 

Emphasis js placed on understanding why numerical methods work and their lim¬ 
itations. This is challenging and involves a balance between theory, error analysis, 
and readability. An error analysis for each method is presented in a fashion that is 
appropriate for the method at hand, yet does not turn off the reader. A mathematical 
derivation for each method is given that uses elementary results and builds Ihe student 's 
understanding of calculus. Computer assignments using MATLAB give students an 
opportunity to practice their skills at scientific programming. 

Shorter numerical exercises can be carried out with a pocket calculator/computer, 
and the longer ones can be done using MATLAB subroutines. It is left for the instruc¬ 
tor to guide the students regarding the pedagogical use of numerical computations 
Each insirucior can make assignments that are appropriate to the available comput 
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mg resources. Experimentation with the MATLAB subroutine libraries is encouraged. 
These.materials can be used to assist students in the completion of the numerical anal¬ 
ysis component of computer laboratory exercises. 

This Third Edition grows out of much polishing of the narrative for the Second 
Edition. For example, the QR method has been added to the chapter on Eigenvalues 
and Eigenvectors. New to this edition is the explicit use of the software MATLAB. 
An appendix gives an introduction to MATLAB syntax. Examples have been added 
throughout the text with MATLAB and complete MATLAB programs are given in 
each section. An instructor's disk is available upon request from the publisher. 

Previously we took the attitude that any software program that students mastered 
would work fine. However, many students entering this course have yet to master a 
programming language (computer science students excepted). MATLAB has become 
the tool of nearly all engineers and applied mathematicians, and its newest versions 
have improved the programming aspects. So we ibink that students will have an easier 
and more productive lime in this MATLAB version ol our text. 
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Preliminaries 


Consider the function fix) = cos(x), its derivative f'{x) = -sin(x). and its an¬ 
tiderivative F(x) = sin(x) + C. These formulas were studied in calculus. The former 
is used to determine the slope m = /'(xo) of the curve .v = /{.v)ata point (xo, /(xol), 
and the latter is used to compute the area under the curve for a < x <b. 

The slope at the point ( tt/2 , 0) is m = /'(tt/ 2) — - 1 and can be used to find the 
tangent line at this point (see Figure 1.1(a)): 
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The area under the curve for 0 < x < tr/2 is computed using an integral (see Fig¬ 
ure 1.1(b)): 

7T f 2 

area _ j cos(j <)dx - F - F( 0) — sin - 0=1. 

These are some of the results that we will need to use from calculus. 


Review of Calculus 

It is assumed that the reader is tamiliar with the notation and subject matter covered in 
the undergraduate calculus sequence. This should have included the topics of limits, 
continuity, differentiation, integration, sequences, and series. Throughout the book we 
refer to the following results. 

Limits and Continuity 

Definition 1*1. Assume that fix) is defined on a set S of real numbers. Then / is 
said to have the limit L at a = .to, and we write 

(l) lim /(*) = /.. 

J -t'JC 

if, given any e > 0, there exists a 6 ■> 0 such that, whenever jc e 5, 0 < \x - jco( < 5 
implies that |/(j;) - L\ < t. When the h -increment notation x — jcq + h is used, 
equation (1) becomes 

lim fix o + h) — L. A 

h -,0 ' 


(2) 


Stc. 1.1 Review of Calculus 


3 


Definition 1.2. Assume that fix ) is defined on a set 5 of real numbers and let *0 € S. 
Then / is said to be continuous at x * xo if 

(3) lim fix) = /(xu). 

Ihe function / is said to be continuous on 5 if it is continuous at each point x £ 6. 
The notation C(S f ) stands for the set of all functions / such that / and its first n 
derivatives are continuous on S. When S is an interval, say [ a , b], then the notation 
C n [a,b] is used. As an example, consider the function fix) = x 4/3 on the inter¬ 
val [-1, 1], Clearly, f(x) and f‘(x) — (4/3)jt 1/3 are continuous on (-1, If while 
f\x ) = (4/9)x~^ 3 is not continuous at x = 0. A 

Definition 1*3. Suppose that is an infinite sequence. Then the sequence is 

said to have the limit L. and we write 

(4) lim x„ — J*. 

n -MX 

if, given any c > 0, Lliere exists a positive integer /V - ;V(e) such that n > ft implies 
that \x n — L] < € A 

When a sequence has a limit, we say that it is a convergent sequence. Another 
commonly used notation is "x„ > L as n Do." Equation (4) is equivalent to 

(5) lim (*„ L) = 0. 

n^-Oo 

Thus we can view the sequence (tn l^i — l** _ as an error sequence. The 

following theorem relates the concepts of continuity and convergent sequence. 

Theorem 1.1. Assume that fix) is defined on the set S and xq g S. The following 
statements are equivalent: 

(a) The funedon / is continuous at xq. 

^ (b)If lim x„ -x 0t then lim /(jc„j = fix oj. 

Theorem 1,2 (Intermediate Value Theorem), Assume that / g C[a, b] and L is 
any number between f{o) and fib ). Then there exists a number c t with c e (a, b). 
such that /(c) = L. 

Example 1.1. The function fix) — cos(x — 1) is continuous over [0,1], and the constant 
L - 0.8 e (cos(0), cos{ I)). The solution to f(x) = 0.8 over [0.1) is ci = 0.356499. 
Similarly, fix) is continuous over [1, 2. JJ, and L = 0.8 e (cos(2.5), cos(D). The solution 
to fix) = 0.8 over [ 1,2.5] is C 2 — 1.643502, These two oases ate shown in Figure 1.2. ■ 
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Figure 1.2 The intermedia! 
theorem applied to the function 
f(x) ~ cos(x - 1) over [0,1] and 
over the interval [1, 2.5J. 



Figure 1.3 The extreme wiuc 
theorem applied to the fraction 
f(x) = 35 + 59.5jt - 66.5 jc : - 
over the interval [0, 3]. 


Theorem 13 (Extreme Value Theorem for a Continuous Function). Assume that 
/ e C[a, b], Then there exists a lower bound Mu an upper bound M 2 , and two 
numbers x \, xi e [a, b] such that 

(7) Mi = f(X]) < f(x) < /(JC 2 ) = M 2 whenever x € [a, b ] 

We sometimes express this by writing 

(8) M] = fix 1 ) = min {/(*)} and M 2 = f{x 2 ) = max {/(*)}. 

a<x<b a<x<b 


Differentiable Functions 

Definition 1.4. Assume that fix) is defined on an open interval containing jcq. Then 
/ is said to be differentiable at xq if 


lim 

jc —*■ jqo 


fix)-f{x Q ) 
x-x§ 


( 9 ) 
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exists. When this limit exists, it is denoted by f{xo) and is called the derivative of / 
at jro. An equivalent way to express this limit is to use the A-increment notation: 


( 10 ) 


lim 

h^o 


fjxp -HA) - f(xc) 
h 


f Uo> 


A function that has a derivative at each point in a set S is said to be differentiable 
on S. Note that, the number m = f f {xf) is the slope of the tangent line to the graph of 
the function y = /(x) at the point Uo, /(.* 0 )). a 


Theorem 1.4. If fix) is differentiable at x = xq, then fix) is continuous at x = xq. 

It follows from Theorem 1.3 that, if a function / is differentiable on a dosed 
interval | a, b], then its extreme values occur at the end points of the interval or at the 
critical points (solutions of f'(x) = 0) in the open interval ia, b). 

Example 1.2, The function /(.r) = 15x“ -66.5.r 2 ^59.5x f-35 is differentiable on (0, 3], 
The solutions to fix) = 45.r 2 — I23x +- 59.5 = 0 are x\ = 0.54955 andx 2 = 2.40601. 
The maximum and minimum values of / on [0. 3] are: 

minj/fO), fi3),f(xi), fix 2 )) = rnin(35. 20. 50.10438, 2.11850) = 2.11850 

and 


max(/(0), /(3), /(x,), /U 2 )j = max{35. 20, 50.10438, 2,118505 = 50.10438. ■ 


Theorem 1.5 {RoUe’s Theorem), Assume that / <E C(u, 6) and that /'(x) exists for 
all j: e (a, b ). If fia) = fib) — 0, then there exists a number c, with c g: (a, b). such 
that fic) = 0. 


Theorem 1.6 (Mean Value Theorem). Assume thai / e C[a, b\ and that fix) 
exists for all x e (a, b). Then there exists a number c, with c e ia, b), such that 


(31) 


f'ic) 


fib)-fia) 
b — a 


Geometrically, the Mean Value Theorem says that there is at least one number 
c € { a , b) such that (he slope of the tangent line to the graph of y = fix) at the point 
(c, /(c)) equals the slope of the secant hne through the points (a, /(a)) and {b, fib)). 


Example 1.3. The function fix) = sin(x) is continuous on the closed interval (0.1,2.1 ] 
and differentiable on the open interval (0.1,2.1). Thus, by the Mean Value Theorem .there 
is a number c such that 


f'ic) = 


/(2.1) - /(0.1) 0 $63209 - 0.099833 


2.1 - 0.1 


2.1 - 0.1 


= 0.381688. 


Hie solution to f'ic) = costc) = 0.381688 in the interval (0.1, 2.J) is c 1.179174. 
Tie graphs of fix), the secant line y = 0.38l688jr -+- 0.099833, and the tangent line 
- = 0.381688jt + 0.474215 are shown m Figure 1.4. a 
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Figure 1.4 The mean value theorem applied to f(x) = 
sin(jc j over the interval (0.1,2.1J. 


/'<*), f'(x) __ f in) (x) exist over (a, b) and xq, x\ . x n € [a, b]. If f(xj ) = 0 

for j = 0,1,.. .,n, then there exists a number c, with c e (a, b), such that / (n) (c) = 0. 

Integrals 

Theorem 1.8 (First Fundamental Theorem). If / is continuous over [a, b] and F 
is any antiderivative of /on [a, b], then 

(12) f b f(x) dx = F{b) - F(a ) where F\x) = f{x). 


x e (a, b), then 


d_ f* 
dx J a 




Example 1.4. The function f(x) = cos(x) satisfies the hypotheses of Theorem 1,9 over 
the interval fO, tt/2 ], thus by the chain rule 

f cos(t) dt = cos(x*)(x £ Y = 2xcos(x*). m 

dx Jo 

Theorem 1.10 (Mean Value Theorem for Integrals). Assume that / € C[a, b]. 
Then there exists a number c, with c e (a, b), such that 


1 f b 

b — a J a 


f(x)dx = /(c). 


The value /(c) is the average value of / over the interval [a, b\. 
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v = /(*) 


0.0 0.5 1.0 1.5 2.0 


theorem for integrals applied to 
x f(x) = siti(jf) + ^ sin(3x) over the 
interval [0, 2.5]. 


Example 1.5. The function /(x) = sin(x) + \ sin(3x) satisfies the hypotheses of The¬ 
orem 1.10 over the intervai [0, 2.5]. An antiderivative of /(x) is F(x) = — cos(x) - 
i co&Gxt. The averaee value of the function f fxl over the interval 10. 2.51 is: 


1 I rr , ^( 2 - 5 ) - F 

15^0 Jo fix)dx = - 2^5 


1.873740 


: 0.749496. 


There are three solutions to the equation /(c) = 0.749496 over the interval [0,2.5]: 
ci = 0,440566, C 2 = 1.268010, and C3 = 1.873583, The area of the rectangle with 
base b — a = 2.5 and height f(cj) — 0.749496 is f(cj)(b — a) — 1.873740. The area 
of the rectangle has the same numerical value as the integral of fix) taken over the inter¬ 
val [0, 2.5]. A comparison of the area under the curve y = f(x) and that of the rectangle 


Theorem 1.11 (Weighted Integral Mean Value Theorem), Assume that f,ge 
C[a, b ] and g{x) > 0 for x e [a, b]. Then there exists a number c, with c e ( a , b), 
such that 


f(x)g(x)dx = /(c) / g{x)dx. 


Example 1.6. The functions fix) = sin(x) and gix) = x 2 satisfy the hypotheses uf 
Theorem 1.11 over the interval [0, jt/ 2]. Thus there exists a number c such that 

■ , v fo /2 x 2 sin(x)dx 1.14159 

sin(c) = Vw 0883631 

ore = sin-Ho.SSSeSl) = 1.08356, m 
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Series 

Definition L5, Let be a sequence. Then a n is an infinite series. The 

nth partial sum is S n = The infinite scries convert if and only if the 

sequence {£„}“ , converges to a limit S , that is, 

n 

( 15) lim S„ = Jim m = S. 

n -»oo 

If a series does not converge, we say that it diverges. a 


Example 1.7. Consider the infinite sequence {a*}™ j = I - . Then the /ith 

partial sum is [n(n + l) n- i 


5=V— -.rf 1 1 Ul 1 

" j“ k{k + I) * + i/ « + !' 


Therefore, the sum of the infinite series is 


S — lim Sn = lim f 1 — 

C fl -*fX? k fl 


Theorem 1.12 (Taylor’s Theorem). Assume that / € C n+] [a,b] and let .to € 
[a, b]. Then, for every jt e (a. b ), there exists a number c = cu) (the value of c 
depends on the value of x) that iies between and .t such that 

( 16 ) /( x)= P„(x) + R»(x). 

where 

to. 

kZ 0 k - 

ind 

f(n+ 

W RnU)= { }J ( X - X0 )^'. 

(n 4- 1)! 

Example 1.8. The function /(*) = sin(j) satisfies the hypotheses of Theorem 1.12. 1 
Tayloi polynomial P„(x) of degree/t = 9 expanded about .to = 0 is obtained bv evaluate 
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Figure 1,6 The graph of f(x) = sin(t) and the Taylor 
polynomial P(x)-x-x i / 3! +x*/5\ - x 1 /V. +x 9 /9!. 


f(x) = sin(x), /(0) = 0, 

/'U) = cos(.t). /'(0)=1, 

/"(*) = - sin(Jt), /"(0) = 0, 

/ (3| (x) _-cosU), /< 3 \o)_-l 



Evaluation of a Polynomial 

Let the polynomial / , (r) of degree n have the form 

P(x) - a n x n -f a„_jjf n_l +- Ya^x 2 + mx +ao. 


( 20 ) 
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Hamer’s method or synthetic division is a technique for evaluating polynomials, ft 
can be thought of as nested multiplication. For example, a fifth-degree polynomial can 
be written m the nested multiplication form 

fM*) = (((( 05 * + a 4 )x + 03 )jt - a\)x 4 * o () . 

Theorem 1.13 (Horner's Method for Polynomial Evaluation). Assume that P{x) 
is the polynomial given in equation (20) and x = c is a number for which P(c) is to be 
evaluated, 

Set fa = a„ and compute 

(- 1 ) fa = at + cfa * ] for k = n - 1 , /; - 2 , .. ., 1 , 0 ; 

then fa ~ P(c). Moreover, if 

( 22 ) Go(*) = b n x n * t b n -]X n £ + - - - -r fax 2 fax + fa . 
then 

( 23 ) P(x) = {x~c)Q 0 {x) + R € , 

where Qo(x ) is the quotient polynomial of degree n — J and Ro = fa = P(c) is the 
remainder. 

Proof. Substituting the right side of equation (22) for Got*) and b 0 for R 0 in equa¬ 
tion (23) yields 

^(*) = (* c)(fax *•“ b/%— ix ^ ■ -f fax 2 + fax + fa ) 4 - 

(24) ^ h H v» - (V_. _ cb n )x n ~ ] H-+ f *2 — cfa)x 2 

4* {b\ — cfa)x 4- (fa — cb\) t 

The numbers bk are detennined by comparing the coefficients of x k in equations (20) 
and (24), as shown in Table 1.1. 

The value P(c) = fa is easily obtained by substituting x = c into equation (22) 
and using the fact that R 0 = bo: 

(25) P(c) = (c~c)Q a [c)^R 0 = fa. • 


The recursive formula for bk given in (21) is easy to implement with a computer. 
A simple algorithm is 

b(n) = a(n}: 

for k = n - 1; - 1: 0 

h(k)^a(k) + c*b{k * 1 ); 
end 


ThMe 1J Coefficients bk far Homer’s Method 


X* 

Comparing (20) and (24) 

Solving for b k 

x rt 

x n~l 

On =bn 

a n-l =b„-i -cb n 

-a* 

b> r-l ~ a n-i +rrbn 

x k 

1 ak=bk-cbt+ 1 

bk =on+cb k+ i 

x° 

00 = 1*0-cb[ 

bo=ao +cb[ 


Table 1,2 Homer's Table for the Synthetic Division Process 

Input | a n ct n ~\ o n -2 1 ■ ■ om * * ■ 02 a\ oq 

* | _ xfa • xbt-k 1 ■■ xfa xfa xfa 

fa b n - 1 fa-2 bk ■ fa bi • fa= P(x) 

j Output 


When Homer’s method is performed by hand, it is easier to write the coefficients of 
r(x) on a line and perform the calculation bk — a* + cb k +1 below a k m a column. 
The format for this procedure is illustrated in Table 1.2. 


Example 1.9. Use synthetic division (Homer’s method) to find P(3) for the polynomial 




P(x) = X 5 

- 6x 4 4- 8x 3 

-8x : 

+ 4x~40. 



os 

04 

«3 

02 

a\ 

&o 

Input 

] 

-6 

8 

8 

4 

-40 

* = 3 


3 

-9 

-3 

15 

57 

< 

1 

-3 

-1 

5 

19 

17 = P(3) = bo 


bs 

*4 

fa 

fa 

b\ 

Output 


Therefore, P(3) = 17 
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Exercises for Review of Cajculus_ 

1. (a) Find L — lim„ -I- \)/(2n - 3). Then determine {t K | = \L x n } and find 

dm„ 

(b) FindL = 1 J/(4/T 2 -p2n+1). Then determine {e„j = {/.-jr n } 

and find lina n _tu. 

2. Lef Un 1^1, be a sequence such that x n = 2. 

(a) Find lim„_^sinCv„). fb) Find Inf* 2 ). 

3. Fine the numbers) <■ referred to in the intermediate value theorem for each function 
over the indicated interval and lor the given value of L. 

(a) /(a> = -x 7 + 2x ■+ 3 over [-1,0] using L — 2 

(b) f{x) = n/'v 2 - - 2 over [6, 8] using L = 3 

4. Find the upper and lower bounds referred to in the extreme value theorem for each 
function over die indicated interval. 

fa) /(.r) — ~ 3x + l over j — 1,2J 

(b) /(*) = cos“(jr) - sin(x) over[0, 2,-rJ 

5. Find the numberfs) c referred to in Rolle's theorem for each function over the indi¬ 
cated interval, 

(a) fix) = x 4 - 4x 2 over |—2, 2] 

(bj fix) = sinf*) + sin(ZA-) over |G, 27r] 

6. Find the number(s) c referred to in the mean value theorem for each function over the 
indicated interval, 

(a) /f.r) ~ Jx over |0,41 

x 2 

(hi fivl —-over 10. 11 

' .r — 1 

7. Apply the generalized Rolle’s theorem to fix ) — *(* — 1)(jc — 3) over [0, 3], 

8* Apply the firsi fundamental theorem of calculus to each function over the indicated 
interval. 

U) fix) = xe* over (0, 2] 

lb) fix) = over f-1,1] 

x l + 1 

v. Apply the second fundamental theorem of caicuius to each function, 

(a) £ i 2 cos(t) dt (b) £ f] e 1 'dt 

10. Find the number(s) c referred to in the mean value theorem for integrals for each 
function, over the indicated interval. 

fa) fix) = 6.v J over | -3, 4'_ 

fb) fix) - x cos I, a) over j0. 3 jt/2J 

11. Find the sum of each sequence or series. 


ScC. 1.2 Binary Numbers 


i; 


(c> y; 


n(n ■* 1 ) 


eJ 


12. Find the Taylor polynomial of degree n -- 4 lor each function expanded about 
given value of xo- 

(a) f (x) = -Jx, y-Q = 1 

(b) fix) = x i + 4x 2 + 3x + 1 1 reo = 0 

(c) /(jc) = cos(jt),*o = 0 

13. Given that fix) = sin(x) and /*(*) = x - jt 3 /3!4 x s f5\—x 1 p\ + x 9 f < }\. Show tl: i 

/><*>{0) = /<*>(0) for fc = 1,2.9. 

14. Use synthetic division (Homer's method) to find Pfc). 

(a) P(jc) = jc 4 +x 3 - 13;c 2 -x — 12, c = 3 

(b) P{x) = 2jt 7 -P-P x 5 — 2j 4 — x + 23, c = — l 

15. Find the average area of all circles centered at the origin with radii between 1 and 3. 

16. Assume that a polynomial, Pfx), has n real roots in the interval [a, b], Show tha 

) has at least one real mot in the interval [a, £>]. 

17. Assume that /, and f" are defined on the interval [a.b\\ fia) — fib) - 0; am 
/(c) > Ofor c e (a. b). Show that there is a number d e (a, b) such that f’fd) < 0 


1.2 Binary Numbers 

Human beings do arithmetic using the decimal (base 10) number system. Most corn 
puters do arithmetic using the binary (base 2) number system, It may seem otherwise 
since communication with the computer (input/output) is in base 10 numbers. Thi: 
transparency does not mean that the computer uses base 10. In fact, it converts input, 
to base 2 (or perhaps base 16), then performs base 2 arithmetic, and finally translate 
the answer into base 10 before it displays a result. Some experimentation is require! 
to verify this. One computer with nine decimal digits of accuracy gave the answer 

100.000 

(1) Y 0.1-9999.99447. 

Here the intent was to add the number £ repeatedly 100, 000 limes. The mathematic;! 
answer is exactly 10.000. One goal is to understand the reason for the computer’s ap 
parently flawed calculation. At the end of this section, it will be shown how somethin; 
is lost when the computer translates the decimal fraction ^ into a binary number. 
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Binary Numbers 

Base 10 numbers are used for most mathematical purposes. For ii lustration, the number 
] 563 is expressible in expanded form as 

1563 = (1 x !0 3 ) - (5 x 10 2 } + (6 x 10 1 ) + (3 x 10°). 

In general, let N denote a positive integer; then Lie digits a 0 , &\ _, a k exist so that 

N has the base 10 expansion 

A’ — (a* x x 10*"" 1 ) + - ■• -r (aj x 10 1 ) -f- (^ 0 ^ 10°), 

where the digits a k are chosen from [0, 1.8, 9}. Thus N is expressed in decimal 

noration as 

(2) N ■= a k a k - \ ■ ■ ■ a 2 a\ ao, en (decimal) 

If it is understood that 10 is the base, then (2) is written as 

N = otat-i■■-a 2 a\aQ, 

For example, we understand that 1563 = I563 ten - 
Using powers of 2, the number 1563 can be written 

1563 = (1 x 2 ]0 ) + (1 x 2 9 } + (0 x 2 s ) + (0 x 2 7 ) + (0 x 2 6 ) 

(3) -fOx 2 5 ) + <l x 2 4 ) + (1 x 2 3 ) + (0 x 2 2 ) + (1 x 2 1 ) 

+ (1 x 2°). 

This can be verified by performing the calculation 

1563 = 1024 + 512+ 16 + 8 + 2+1. 

In general, let N denote a positive integer; the digits *o, b\ . bj exist so that N 

has the base 2 expansion 

(4) N = {bj x 2 J ) + {bj-\ x 2 J -T + -. + (fi 1 x2 1 ) + (^,x 2°), 
where each digit b, is cither a 0 or 1. Thus N is expressed in binary notation as 

(5) N = bjhj- \ ■ ■ -^j/botwo (binary). 

Using the notation (5) and the result in (3) yields 

1563 = 1100001101W 

Remarks. The word "two” will always be used as a subscript at the end of a binary 
number. This will enable the reader to distinguish binary numbers from the ordinary 
base 10 usage. Thus 111 means one hundred eleven, whereas L11 two stands for seven. 
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It is usually the case that the binary representation for N will require more digits 
than the decimal representation. This is due lo the fact that powers of 2 grow much 
more slowly than do powers of 10. 

An efficient algorithm for finding the base 2 representation of the integer A' can be 
derived from equation (4). Dividing both sides of (4) by 2 yields 

(6) j = (bj x2'-')+ (&./_] x 2 J ‘ 2 ) s- * ■ ■ -* i 'bi *2°) + y. 

Hence the remainder, upon dividing N by 2, is the digit bo. Now determine *\. If 
is written as Nf 2 = Q 0 4- bo/2, (hen 

(7) Qo~{bj x2 J - l ) + (bj..i x 2 1 2 ) + ■ ■ - -t ib 7 x 2 1 ) + (*j x 2°i. 

Now divide both sides of (7) by 2 to get 

Y = (*/ x 2'" 2 ) * (bj -1 .< 2'“’) t-. + Uj* 2°) H b. 

Hence the remainder, upon dividing Qo by 2, is the digit b \, This process is continued 
and generates sequences { Q k ) and I** j of quotients and remainders, respectively. The 
process is terminated when an integer J is found such that Qj =0. The sequences 
obey the following formulas; 

N = 2Q 0 ~bo 
Qo - 2Q\ 

m : 

Qj~2 = 2Qj-i +bj -j 

Qj-l = 2Qj + bj {Qj = 0). 

Example 1.10. Show bow to obtain 1563 = 1100001101 

Stan with N = 1563 and construct the quotients and remainders according to the 
equations in (8): 


1563 = 2 x 781 + I, 

bo — 1 

781 = 2 x 

390+ 1, 

bi = 1 

390= 2 x 

195 + 0, 

*2 = 0 

195 = 2 x 

97 + 1, 

*3 = 1 

97 = 2 x 

48+ 1, 

*4 = 1 

48 = 2 x 

24 0, 

*5=0 

24 = 2 x 

12 + 0, 

*6=0 

12 = 2 x 

6+0, 

*7=0 

6= 2 x 

3 + 0, 

5“ 

II 

o 

X 

II 

1 + U 

*9 = 1 

1 = 2 x 

0+1, 

bio — 
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Thus the binary representation for 1563 is 

1563 ■= b\ab$b^ ■ ■ -i | 2^ii'0i*o = 1100001101 l tW 0' *• 

Sequences and Series 

When rational numbers are expressed ir. decimal form, it is often the case that infinitely 
many digits are required. A familiar example is 

(9) ^ = 0.3. 

Here the symbol 3 means that the digit 3 is repeated forever to form an infinite repeating 
decimal. It is understood that 10 is the base in (9), Moreover, it is the mathematical 
intent that (9) is the shorthand notation for the infinite series 

5 = (3 x 10 ’) + (3 x if)" 2 ) - - ■ ■ + (3 x 10“") + . ■ - 

(10> =|S(io,-* = l. 

Jt=i 

ff only a finite number of digits is displayed, then an approximation to 1 /3 is obtained. 
For example, 1/3 s= 0.333 = 333/1000. The error in this approximation is 1/3000, 
Using (10), the reader can verify that 1/3 = 0.333 - 1/3000. 

It is important to understand the expansion in (10). A naive approach is to multiply 
both sides by 10 and then subtract. 

105 = 3 + (3 x 10 ! ) + (3 x 10“ 2 ) + • ■ ■ + (3 x KT") + • • ■ 

-S = - (3 x 10" ! ) - (3 x ]0“ 2 )-(3 x 10“'*)- 

9S = 3+(0x I0" l ) + (0x (0“ 2 ) - ■■• + (0x 10 ")+ - 

Therefore, S = 3/9 = 1/3. The theorems necessary to justify taking the difference 
between two infinite series can be found m most calculus books, A review of a few of 
the concepts follows, and the reader may want to refer to a standard text on calculus to 
fill m all the details. 

Definition 1.6 (Geometric Series). The infinite series 

oc 

(li.) = c ,f cr + cr? ~ ^ ■ ■ ■ * cr *■+— 

n -=0 


wnere c 0 and r 0, is called & geometric series with ratio r. 


A 
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Theorem 1-14 (Geometric Series). Tne geometric series has the following proper¬ 
ties; 


If |r; < 1. then ^ cr n = C . 


(13) If |r | > 1, then the series diverges. 

Proof. The summation formula for a finite geometric series is 

(.14) S n = c + cr + cr 2 + - ■ ■ + cr n = ^ ^ ■- for r £ I. 

To establish (12), observe that 

(15) |r < 1 implies that lim r" +] =0. 

n-t-oo 

Taking the limit as n -*■ oc, use (14) and f 15| to get 

lim S* = (l - lim r n+1 ) = —. 
n ■ ► 5o 1 — r V f.-foo / \ — r 

By equation (15) of Section 1.1, the limit above establishes (12). 

When r| > 1, the sequence [r n ~ l ) does not converge. Hence the sequence {5„j 
in (14) does noi tend To a limit Therefore, (13) is established. • 

Equation (12) in Theorem 1.14 represents an efficient way to convert an infinite 
repeating decimal into a fraction. 

F.Yflrt!!'!* Ill 


°- 3 = Y. 3(10 - r * = -3 - jn 3(io) - 


, 3 10 1 

= --3 +-r = -3 + — = 

1-4, 3 3 


Binary Fractions 

Binary' (base 2) fractions can be expressed as sums involving negative powers of 2. If 
if is a real number that lies in the range 0 < R < 1, there exist digits d\, di, .. 
d „,... so that 

(16) if = (di x 2 _i ) - (d 2 x 2 2 ) +-h ( d n x 2~ n ) 4- ■ ■ * , 

where dj e {0, 1). We usually express the quantity on the right side of (16) in the 
binary fraction notation 

(17) if = QAd 2 - ■4 ] -',wo 
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There are many real numbers whose binary representation requires infinitely many 
digits. The fraction 7/JO can be expressed as 0.7 in base 10, yet its base 2 representa¬ 
tion requires infinitely many digits: 

U8) ^=o.ioTTw 

The binary fraction in (18) is a repeating fraction where the group of four digits 0110 
is repeated forever. 

An efficient algorithm for finding base 2 representations can now be developed. 11 
both sides of (16) are multiplied by 2, the resuit is 

(19) 2R = d, + {{d 2 x 2" 1 ) 4-. . (d„ x 2-^ J ) + 

The quantity in parentheses on the right side of (19) is a positive number and is less 
than 1. Therefore d\ is the integer part of 2R, denoted d] = itu (2R). To continue ihe 
process, take the fractional part of (19) and write 

(20) F t = fract2/f) = (d 2 x 2' *» H-1- (rf„ x + , 

where frac(2«) is the fractional part of the real number 2 R. Multiplication of both 
sides of (20) by 2 results in 

(21) 2Fi + m * T ] ) -r ■■■ + «t n x2~ n+z ) + 

Now take the integer part of (21) and obtain d 2 = int(2Fi). 

The process is continued, possibly ad infinitum (if R has an infinite nonrepeating 
base 2 representation), and two sequences (aVI and {/*) arc recursively generated. 

Q2 rf* = inl(2f t _ 1 ). 

f’t — frac(2Ft_i). 

where d\ = int(2/?) and F\ = trac(2/?). The binary decimal representation of R is 
then given by the convergent geometric series 

J =1 

Example 1.12. The binary decimal representation of 7/10 given in (18) was found usmg 


the formulas in (22). 

Let R = 7/10 = 0,7; then 




2R = 1.4 

di — int( 1.4) = 1 

F i 

= frac( 1.4) 

= 0.4 

2Fi — 0.8 

d 2 = int(0.8) = 0 

f'2 

= frac(0,8) 

= 0.8 

2 F 2 = 1.6 

dj = mt( 1.6) = 1 

^3 

— fracd.6) 

— 0.6 

2 F 3 = 1.2 

d 4 = int(l,2) = 1 

Fa 

= frac(l,2) 

= 0.2 

2a =0.4 

ds — int(0.4) — 0 


= frac(0.4) 

= 0.4 

2F> = 0.8 

df, = int(0.8) = 0 

Ft 

frac(0,8) 

= 08 

2F b = 1.6 

di = int( 3.6)= 1 

Fi 

= fraud.6) 

=- 0 6 
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Note that 2F 2 = 1.6 = 2 F b . Th e patterns d* = d^A and Ft = F *+4 will occur for k = 2, 
3.4, .. Thus 7/10 = 0.10il0, wo . ■ 

Geometric series can be used to find the base 10 rational number that a binary 
number represents. 

Example 1.13. Find the base 10 rational number that the binary number 0.01 rwo repre¬ 
sents. In expanded form, 

0.01,wo = (0 x 2" 1 ) 4- (1 x 2~ 2 ) - (0 x 2 -3 } 4 (1 x 2“ 4 ) F ■ ■ 

QO PC 

= ^(2- 2 )* =-l + ^(2~ 2 )* 
k=\ t=o 


Binary Shifting 

If a rational number that is equivalent to an infinite repeating binary expansion is to be 
found, then a shift in the digits can be helpful. For example, let 5 be given by 

<23 ) 5 = o.ooooonooot w3 . 

Multiplying both sides of (23) by 2 ; will shift the binary point five places to the right, 
and 32 S has the form 

(24) 325 = O.TTOOOtwo- 

Sitnilarh. multiplying both sides of (23) by 2 ]C will shift the binary point ten places to 
the right and 10245 has the form 

(25) 10245 = 13 000.1!00(W 

The result of naively taking the differences between the left- and right-hand sides of 
(24) and (2.5) is 9925 = 1 lOOO^o or 9925 = 24. since 1100a wo = 24. Therefore. 

5 - 8/33. 

Scientific Notation 

A standard way to present a real number, called scientific notation, is obtained by 
sin lung me decimal point and supplying an appropriate power of 10. For example, 

0.0000747= 7.47 x 10 -5 , 

31.4159265 = 3.14159265 x 10. 

9.700,000.000 = 9.7 x 10 9 . 

In chemistry, an important constant is Avogadro’s number, which is 6.02252 x JO 23 . It 
is the number of atoms in the gram atomic weight of an element. In computer science, 

I K 1.024 x 10 3 . 
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Table 1 3 Decimal Equivalents for a Set of Binary Numbers with 4-Bit Mantissa and 

Exponent of n — —3, —2,..., 3,4 



Machine Numbers 

Computers use a normalized floating-point binary representation for real numbers. 
This means that the mathematical quantity v is not actually stored in the computer. 
Instead, the computer stores a binary approximation to x: 

(26) x^±qx r. 

The number q is the mantissa and it is a finite binary expression satisfying the inequal¬ 
ity 1 /2 < q < 1. The integer n is called the exponent. 

In a computer, only a small subset of the real number system is used. Typically, this 
subset contains only a portion of the binary numbers suggested by (26). The number 
of binary digits is restricted in both the numbers q and n. For example, consider the 
set of all positive real numbers of the form 

( 27 ) 0.d l d 2 d3d 4wo x2 n , 

where d\ = 1 and d 2 , ^ 3 , and are either 0 or 1, and n € {—3, —2, —1,0, 1,2, 3, 4). 
There are eight choices for the mantissa and eight choices for the exponent in (27), and 
this produces a set of 64 numbers; 

(28) {O.lOOOtwo x 2 -3 , 0.1001 tvvo x 2“ 3 , ,.. ,0.1110^0 x 2 4 , 0.1111^ x 2 4 }. 

The decimal forms of these 64 numbers are given in Table 1.3. It is important to leam 
that when the mantissa and exponent in (27) are restricted the computer has a limited 
number of values it chooses from to store as an approximation to the real numbers. 

What would happen if a computer had only a 4-bit mantissa and was restricted 
to perform the computation (-pj + 5 ) + g? Assume that the computer rounds all real 
numbers to the closest binary number in Table 1.3. At each step the reader can look at 
the table to see that the best approximation is being used. 
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^ « 0.110WX2- 3 = 

0.01101 two x 2~ 2 

(29) ^ ^ 0.1101tw O x 2“ 2 = 

O.llOltwo x2~ 2 

TO 

0.00111^ x 2“ 2 , 

The computer must decide how to store the number 0.00111*™, x 2 2 . Assume that u 
is rounded to O.lOlOtwo x 2 -i . The next step is 

A » 0-10HW x 2-' = 

0.1010t W o x 2 -1 

(30) i a* O.lOlltwo x 2' 2 = 

0.0101 W x2" 1 

7 

13 

0.11111 two x2- 1 . 

The computer must decide how to store the number 0.11111^ x 2 1 . Since rounding 


is assumed to take place, it stores O.lOOOGtao x 2°. Therefore, the computer’s solution 
to the addition problem is 

(31) ^ 0.100(XW> x 2°. 

The error in the computer’s calculation is 

(32) ^ - O.lOOGOtwo 0.466667 - 0.500000 0.033333. 

Expressed as a percentage of 7/15, this amounts to 7.14%. 

Computer Accuracy 

To store numbers accurately, computers must have floating-point binary numbers with 
at least 24 binary bits used for the mantissa; this translates to about seven decimal 
places. If a 32-bit mantissa is used, numbers with nine decimal places can be stored. 
Now, again, consider the difficulty encountered in (1) at the beginning of the section, 
when a computer added 1/10 repeatedly. 

Suppose that the mantissa q in (26) contains 32 binary bits. The condition 1/2 < q 
implies that the first digit is d\ = 1. Hence q has the form 

(33) q= 0.1 d 2 di '*-^31^32two- 

When fractions are represented in binary form, it is often the case that infinitely 
many digits are required. An example is 

— — 0.0001 Itwo- 


(34) 
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When the 32-bit mantissa is used, truncation occurs and the computer uses the internal 
approximation 

(35) ~ ^ 0,11001100110011001100110011001100u o X 2 -3 . 

The error in the approximation in (35), the difference between (34) and (35) is 

(36) O.UOOtwa x 2“ 35 % 2.328306437 x 10' 11 . 


Because of (36), the computer must be in error when it sums the 100,000 addends 
of i/10 in (1). The error must be greater than (100,000)(2,328306437 x 10' n ) = 
2.328306437 x 10~°, Indeed, there is a much larger error. Occasionally, the partial 


sum could be rounded up or down. Also, as the sum grows, the latter addends of 1/10 
are small compared to the current size of the sum, and their contribution is truncated 
more severely. The compounding effect of these errors actually produced the error 
10,000 - 9999,99447 = 5.53 x 10" 3 . 


Computer Floating-point Numbers 

Computers have both an integer inode and & floating-point mode for representing num¬ 
bers. The integer mode is used for performing calculations that are known to be integer 
valued and has Limited usage for numerical analysis. Floating-point numbers are used 
for scientific and engineering applications. It must be understood that any computer 
implementation of equation f26) places restrictions on the number of digits used in the 
mantissa q , and that the range of possible exponents n must be limited. 

Computers that use 32 bits to represent single-precision real numbers use 8 bits 
for the exponent and 24 bits for the mantissa. They can represent real numbers with 
magnitudes in the range 

2.938736E - 39 to 1.701412E + 38 

(i.e„ 2 -i2S to 2 127 ) with six decimal digits of numerical precision (e.g., 2 -23 = 1.2 x 
10 ' 7 ). 

Computers that use 48 bits to represent single-precision real numbers might use 
8 bits for the exponent and 40 bits for the mantissa. They can represent real numbers 
in the range 

2.9387358771£ — 39 to 1.7014118346E + 38 


(i.e., 2" 12 - to 2 
10 ' f2 ). 


2? ) with 11 decimal digits of numerical precision (e.g,, 


If the computer has 64-bit double-precision real numbers, it might use 11 bits for 
the exponent and 53 bits for the mantissa. They can represent real numbers in the range 


5.562684646268003E — 309 to 8.988465674311580E + 307 


(i,e„ 2 1024 to 2 1023 ) with about 16 decimal digits of numerical precision (e.g., 2' 52 = 
2.2 x 10' 16 ). 
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1. Use a computer to accumulate the following sums. The intent is to have the computer 
do repeated subtractions. Do not use the multiplication shortcut, 

(a) 10,000-0.1 (b) 10,000- 0125 

2, Use equations (4) and (5) to convert the following binary numbers to decimal 


(a) lOlOltwo 
(C) lllllllOtwo 


(b) 111000, wo 

(d) 10000001 ll lwo 


3. Use equations (16) and (17) to convert the following binary fractions to decimal 
(base 10) form. 

(a) 0.11011 two (b) 0.10101two 

(c) O.lOlOlOhwo (d) O.UOllOllOtwo 

4, Convert the following binary numbers to decimal (base 10) form. 


approximations, that is, find 

(a) V2-1.011010l two (Use n/2 = 1,41421356237309 ■••) 

(b) n - 11.001001000l two (Use* = 3.14159265358979 ■-) 

6. Follow Example 1.10 and convert the following to binary numbers. 

(a) 23 (b) 87 (c) 378 (d) 2388 

7. Follow Example 1,12 and convert the following to a binary fraction of the form 


u,a\Q2 ■ “W 
(a) 7/16 


(b) 13/16 


(c) 23/32 


(d) 75/128 


8 . Follow Example 1.12 and convert the following to an infinite repeating binary frac¬ 
tion. 

(a) 1/10 (b) 1/3 (c) 1/7 

9. For the following seven-digit binary approximations, find the error in the approxima¬ 
tion R — 0,rfi<f2^3^4^5^6tf7two* 

(a) 1/10 « 0.0001 lOOtwo (b) 1/7 ^ 0.001 OOKW, 

10, Show that the binary expansion t /7 = 0.00T t w o is equivalent t 0 7 = j + p + ^2 + 

■. Use Theorem 1.14 to establish this expansion. 

11 , Show that the binary expansion 1/5 = O.OOUtwo is equivalent to 5 = ^ + 355 4- 

+ - ■ ■. Use Theorem 1.14 to establish this expansion. 

12, Prove that any number 2~ A \ where A is a positive integer, can be represented as a 

decimal number that has N digits, that is, 2' N = G.didjdi ■ ■ • djv- Hint 1 /2 — 0,5, 
1/4 = 0.25. 
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13. Use Table 1.3 to determine what happens when a computer with a 4-bit mantissa 
performs the following calculations. 

(l + ?) + S <**) (A + $) + 3 

(c) (n + g) + 7 W> (A + 5) + 1 

14. Show that when 2 is replaced by 3 in all the formulas in (8) the result is a method for 
finding the base 3 expansion of a positive integer. Express the following integers in 

base 3, 

(a) 10 (b) 23 (c) 421 (d) 1784 

15. Show that when 2 is replaced by 3 in (22) the result is a method for finding the base 3 
expansion of a positive number R that lies in the interval 0 < R < 1. Express the 
following numbers in base 3. 

(a) 1/3 (b) 1/2 (c) 1/10 (d) 11/27 

16. Show that when 2 is replaced by 5 in all the formulas in (8) the result is a method for 
finding the base 5 expansion of a positive integer. Express the following integers in 
base5. 

(a) 10 (b) 35 (c) 721 (d) 734 

17= Show that when 2 is replaced by 5 in (22) the result is a method for finding the base 5 
expansion of a positive number ft that lies in the interval 0 < R < 1. Express the 
following numbers in base 5. 

(a) 1/3 (b) 1/2 (c) 1/10 (d) 154/625 


1.3 Error Analysis 

In the practice of numerical analysis it is important to be aware that computed solutions 
are not exact mathematical solutions. The precision of a numerical solution ran be 
diminished in several subtle ways. Understanding these difficulties can often guide the 
practitioner in the proper implementation and/or development of numerical algorithms. 

Definition 1.7. Suppose that p is an approximation to p. The absolute error is 
£ p =® \P ~ FU and the relative error is R p ^\p- p\/\p\, provided that p £ 0. A 

The error is simply the difference between the true value and the approximate 
value, whereas the relative error is a portion of the true value. 


Example 1.14. Find the error and relative error in the following three cases. Let x = 
3.141592 and x = 3.14; then the error is 

(la) E x =|jr-xj = J3.141592-3.14[ = 0.001592, 

and the relative error is 



0.001592 

3.141592 


= 0.00507. 
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Let y — 1,000,000 and y = 999,996; then the error is 

(lb) Ey S ]y-yf = 11,000,000 - 999, 9%| =4, 

and the relative error is 


1,000,000 


= 0.000004. 


Let z = 0.000012 and ? = 0.000009; then the error is 
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(lc) E z = \z-z\=* 10.000012 - 0.0000091 = 0.000003, 

and the relative error is 

„ Iz-zl 0.000003 „„„ 

Rz = "jiT 0^00012 = a25 ' 

In case (la), there is not too much difference between E x and R x , and either could 
be used to determine the accuracy of x. In case (lb), the value of y is of magnitude 10 6 , 
the error E y is large, and die relative error R y is small. In this case, "y would probably 
be considered a good approximation to >. In case (lc), z is of magnitude 10 -6 and 
the error E z is the smallest of all three cases, but the relative error R z is the largest. 
In terms of percentage, it amounts to 25%, and thus z is a bad approximation to z. 
Observe that as \p\ moves away from 1 (greater than or less than) the relative error R p 
is a better indicator of the accuracy of the approximation than E p . Relative error is 
preferred for floating-point representations since it deals directly with the mantissa. 


Definition 1.8, The number p is said to approximate p to d significant digits if d is 
the largest positive integer for which 


( 2 ) 


lp-pl < icr^ 
Ipl < 2 


A 


Example 1.15. Determine the number of significant digits for the approximations in 
Example 1.14. 

(3a) If x = 3.141592 and x ~ 3.14, then |* - Jtj/jxj = 0.000507 < 10 _2 /2. Therefore, 
x approximates x to two significant digits. 

(3b) If y = 4,000,000 and y = 999,996, then \y - ^/|y) = 0.000004 < 10~ 5 /2. 
Therefore, 'y approximates y to five significant digits. 

(3c) Ifz = 0.000012 and?= 0.000009, then \z — zl/|zj = 0.25 < 10~°/2. Therefore,? 
approximates z to no significant digits. a 
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Truncation Error 

The notion of truncation error usually refers to errors introduced when a more com 
plicated mathematical expression is “replaced" with a more elementary formula. This 
terminology originals from the technique of replacing a complicated function with a 
truncated Taylor series. For example, die infinite Taylor series 


j 4 x b x ln 

= 1 + 2T + ¥ + 4! 4 "' + ^ + - 


■night be replaced widh just the first five terms 1 F x 2 + ^ + y- ■+■ |r- This might be 
done when approximating an integral numerically. 


Example 1.16* Given that / 0 1/2 e* 2 dx — 0.544987104184 = p, determine the accuracy 
of the approximation obtained by replacing the integrand fix) = e* 2 with the truncated 

Taylor series /M*) ~ i "'~ Jf2 + ; 5r + TT + iT 
Term-by-term integration produces 



,2 




/ x 3 x 5 x 7 x 9 y 
y + T + 5(2!i + 7(3!) + 9{40 j ^ 

] 111 J_ 

2 ^ 24 + 320 + 5376 * 110,592 


2.109,491 
1,870,720 


= 0.544986720817 = 


P- 


Since U> _5 /2 > |p-p]/Sp| = 7.03442x 10 r > 10 6 /2.theapproximationpagiecswith 
the true answer p = 0.544987104184 to five significant digits. The graphs of fix) — 
and y = flt(x) and the area under the curve ford < x < 1/2 are shown in Figure 1.7 ■ 
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Round-Off Error 

A computer's representation of real numbers is limned to die fixed precision of the 
mantissa. Tiuc values are sometimes not stored exactly by a computer's represen- 
lafou This is called round-off error In the preceding section the real number 
1. 10 0.0001 l IWO was truncated when it was stored in a computei. The actual num- 

ner ili.it is stored in the computer may undergo chopping nr rounding of the last digit. 
Therefore, since the computer hardware works with only a limited number of digits in 
machine cumbers, rounding errors are introduced and propagated in successive com¬ 
putations. 


Chopping off Versus Rounding Off 

Consider any real mini her p that is expressed in a normalized decimal form : 

(4) p = iO.rff dzdi ■ ■. c/tt/p,-! ... x 10". 

where J < d\ < 9 and 0 < d - < 9 for j > 1. Suppose that k is the maximum number 
cf decimal digits carried in the floating-point computations of a computer then the real 
number p is represented by //-hep'/ 7 *- which is given by 

(V> f i chop(P ) =■ &Q.tIid 2 tis...dk x 10". 

uhore : < <t\ < 9 and 0 < dj < 9 for I < j < k . The number fi^ip) is called 
me chopped floating-point representation of p. In this case the kth digit of fl^ip) 
agrees with the &th digit of p. An alternative A-digit representation is the rounded 
floating-point representation fl TOim d(p T which :s given by 

T) /Uund^P> - ±0.djd^ . . .tixltf 1 . 

where 1 < </i < 9 and 0 < dj < 9 for 1 < j < k and the fast digit, r*. is obtained 
by rounding the number d^-dt. i i< 4~2 ■ ■ ■ to the nearest integer. For example, the real 

mi Tiber 


p = y = 3.14285714285/142857... 
has the following six digit representations: 

/Wp> = 0 --’ 14235 * " ,l - 

/'™,^P)=0-’l 4 2S6, 10'. 

1 it common purposes the chopping and rounding would be written as 3,14285 and 
3.14286. respectively. The reader should note that essentially all computers use some 
form of the rounded floating point representation method. 
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Loss of Significance 

Consider the two numbers p = 3,1415926536 and q ~ 3.1415957341, which are 
nearly equal and bolh carry 11 decimal digits of precision. Suppose that their differ¬ 
ence is formed: p — q = —0.0000030805. Since the first six digits of p and q are 
the same, their difference p — q contains only five decimal digits of precision. This 
phenomenon is called loss of significance or subtractive cancellation. This reduction 
in the precision of the final computed answer can creep in when it is not suspected. 

Example 1,17. Compare the results of calculating f{500) and e(500) using six digits 


and rounding. The functions are fix ) = x (V* + I — *fx) and g(x) = 
first function,, 


fx+l+Jx 


. For the 


/ (500) = 500 (V50l - V500) 

= 500(22.3830 - 22.3607) = 500(0.0223) = 11.1500 


For g(x), 


n/501 + V500 

500 500 

22.3830 + 22.3607 “ 44,7437 


= 11.1748. 


The second function, g(x), is algebraically equivalent to fix), as shown by the computa¬ 
tion 

_ x (y(x + I — v /x) (Vx + 1 + yT) 

n v / r+T+ v 'i 

_ -v (Wx + i) 2 - (yx) 2 ) 

V* + 1 + Jx 


Vx + l + V* 

The answer, g(500) = 11.1748, involves less error and is the same as that obtained by 
rounding the true answer 11.174755300747198... to six digits. ■ 

The reader is encouraged to study Exercise 12 on how to avoid loss of significance 
in the quadratic formula. The next example shows that a truncated Taylor series will 
sometimes help avoid the loss of significance error. 

Example 1.18. Compare the results of calculating /(Q.01) and P(0.01) using six digits 
and rounding, where 


fix) = 


and P( X )= 1 - + X Z + X -. 
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The function Fix) is the Taylor polynomial of degree n = 2 for fix) expanded about 

x — 0. 

For the first function 


- 1 -0.01 1.010050-1 


For the second function 


P(0.01) = 


1 

2 


0.01 0-001 
6 + 24 


= 0.5 + 0.001667 + 0.000004 = 0.501671. 


The answer P{0.01) 0.501671 contains less error and is the same as that obtained by 

rounding the true answer 0.50167084168057542... to six digits. ■ 

For polynomial evaluation, the rearrangement of terms into nested multiplication form 
will sometimes produce a better result. 


Example 1-19. Let Fix) = x 3 - 3x 2 + 3x - I and Q{x) = ((x - 3)x + 3)x - 1. 
Use three-digit rounding arithmetic to compute approximations to P(2.i9) and 2(2.19). 
Compare them with the true values, P(2.19) = 2(2.19) = 1.685L59. 

F(2.19) ss(2.19) 3 -3(2.19) 2 + 3(2.19)- 1 
= 10.5- 14.4 + 6.57- 1 = 1.67. 

2(2.19} ^ ((2.19 — 3)2.19 + 3)2.19 — 1 = 1,69. 


The errors are 0.015159 and -0.004841, respectively. Thus the approximation 2(2.19) ^ 
3.69 has less error. Exercise 6 explores the situation near the root of this polynomial. ■ 


£?(/*") Order of Approximation 

Clearly the sequences j ^ J ^ and j £ j ^ are both converging to zero. In addition, it 
should be observed that the first sequence is converging to zero more rapidly than the 
second sequence. In the coming chapters some special terminology and notation will 
be used to describe how rapidly a sequence is converging. 

Definition 1.9. The function /(h) is said to be big Oh of gfh), denoted /(h) = 
0(g(h))> if there exist constants C and c such that 

(7) \f{h)\ < C|g(/i)| whenever h < c. A 

Example 1.20. Consider the functions / U) = x 1 +) and gix) = x : . Since x 1 < x 3 and 
1 < x 3 forx > l, it follows that x 2 + 1 < 2x 3 forx > 1. Therefore, fix) = 0(g(x)). ■ 




The big Oh notation provides a useful way of describing the rate of growth of a function 
in terms of well known elementary functions (*" , x 1, '\ log a x, etc.). 

The rate of convergence of sequences can be described in a similar manner. 

Definition 1.10. Let U™}^, and be two sequences. The sequence (*„} is 

said to be of order big Oh of (>•„}, denoted — 0(y rt ), if there exist constants C 
and jV such that 

(8'j jx n j < Cjy„i whenever n > N. a 

Example 1,21. = 0 (jj), since ^ - - n whenever n >1, ■ 

Often a function /(h) is approximated by a function p(h) and the error bound is 
known to be 3/|ft n |. This leads to the following definition. 

Definition 1.11, Assume that /(ft) is approximated by the function p(h) and that 
there exist a real constant M > 0 and a positive integer n so that 

(9) v ‘\ h n^ y ' ^ for su ^ c i e ntly small h. 

We say that p(h) approximates /(h) with order of approximation 0(h n ) and write 

(10) /(h) = p(h) + 0(h n ). a 

When relation (9) is rewritten in the form | f(h) - p(k) I < M\h n \, we see that the 
notation 0(h n ) stands in place of the error hound M\h n l. The following results show 
how to apply the definition to simple combinations of two functions. 


Theorem 1.15. Assume that /(ft) = p(h) + Q\h n ),g(h) = q(h) + 0(h m ), and 
r — min{m. «}. Ihen 

(11) f(h) + g(/t) - p(h) f q(h) + Oih r ), 

(12) j{h)ff(h) = p(h)q(h)+0(h r ). 

and 

(13) ^777 - ^777 1 0(h r ) provided that g(h) ?£0 and q(h) ^ 0 . 
g(n) <?(") 

It is instructive to consider p(x) to be the nth Taylor polynomial approximation 
of /(x); then the remainder term is simply designated 0 (/j ntl ), which stands for the 
presence of omitted terms starting with the power h tl+l . Ihc remainder term converges 
to zero with the same rapidity that h ni] converges to zero as h approaches zero, as 
expressed in the relationship 



(14) 
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for sufficiently small h . Hence the notation 0(h"~ l ) stands in place of the quantity 
Mh' where M is a constant or 'behaves like a constant. 1 ’ 

Theorem 1.16 (Taylor’s Theorem). Assume that / e C n+l [a,h]. If both xq and 
x xo + ll lie 111 Ut. b], then 


/uo + « = E^r*** + 0»* +l >- 


The following example illustrates the above theorems. The computations use the 
addition properties (i) 0(h p ) + 0(k p ) = 0(h p )> (ii) 0(h p ) + 0(ft*) = 0(h r \ 
where r — minfp, q), and the multiplicative property (iii) 0(h p )Q(h*) = 0(h s ), 
where 5 — p \ q. 

Example 1 .21. Consider the Taylor polynomial expansions 

h 2 h 1 h 2 h 4 

/-1+ft-r — + — -r 0(h 4 ) and cos(ft) = 1 - — + — ■+■ 0(h 6 ) 

2! 3! 2! 4! 

Determine the order of approximation for their sum and product. 

For the sum we have 

, h 2 ft 3 , h 2 ft 4 , 

^+cos(/t) = 1 +h+ - + — + 0(h 4 ) + 1 - — + — + 0(h b ) 

ui jl 4 

= 2)H- + Oih 4 ) + 7 - 4- 0(ft 6 ). 

3! 4! 

Since Oih 4 ) -t ^ = 0(h 4 ) and 0(h 4 ) + 0(ft 6 ) = 0(h 4 ), this reduces to 
e h + cos(ft) — 2 + k + — t- £>(ft 4 ), 

and me order of approximation is 0(h A ). 

The product is treated similarly. 

^cos(ft)=^l+ft + ^ + ^ fOOr 4 )^ + 


/ , h 2 ft 3 

= ( +k+ 2! + 3! 


h 2 

2! 4! 


+■ 0(h 4 )0(k 6 ) 

ft 3 5ft 4 ft 5 ft 6 ft 7 
-1+A-y- — - ~ + — + M 4 

4- O(ft 6 ) + 0(ft 4 ) 4 0(h 4 )0(h 6 ). 
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Since D(A 4 )D(/r 6 ) = 0(h l °) and 

'•|?"5 + s + '^ + 0(f,6)+0( * 4) + 0<v “ ) = m>) ' 

the preceding equation is simplified to yield 


*r cmih) = 1 + h - y + (7(A 4 ), 
and the order of approximation is 0(fi 4 ). 


Order of Convergence of a Sequence 

Numerical approximations are often arrived at by computing a sequence of approxi¬ 
mations that get closer and closer to the desired answer. The definition of big Oh for 
sequences was given in Definition 1.10, and the definition of order of convergence for 
a sentience is analogous to that given for functions in Definition til. 


Definition L12, Suppose that = x and is a sequence with 

lim M—i-CO f n — 0, We say that j^ =1 converges to x with the order of conver¬ 
gence O (>„), if there exists a constant K > 0 such that 


\x n ~*l 


< K for n sufficiently large. 


This is indicated by writing x„ ~ x + 0{r„), or x n -> x with order of convex 


Example 1.23. Let x n ~ cos(n )fn £ and r K = 1 {n 2 \ then hm,,-,.^ x r , — 0 with a rate of 
convergence O {\ / « 2 ). This follows immediately from five relation 


1 cos(n)/ n z 


— jcos(rt)| < 1 for all ft. 


Propagation of Error 

Let us investigate how error might be propagated in successive computations, Con-ider 
the addition of two numbers p and q (the true values) with the approximate values p 
and q, which contain errors e p and respectively. Starting with p ~ p + e and 
q = q+£ q , the sum is 

p + q - (p-t € p ) -i- (q + ~(p + q)+(e p + e 9 ). 

ffeace, for addition, the error in the sum is the sum of the errors in the addends. 
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iJ 


The propagation of error in multiplication is more complicated. The product is 
(it) pq = ip + €p)ip + €q) = pq + ptq + ?<= p + <pi q . 

Hence, if p and q are larger than 1 in absolute value, terms pe q and qc p show that there 
is a possibility of magnification of the original errors e p and c q . Insights arc gained if 
wt took at the relative error. Rearrange the terms in (J 71 to get 

( 1 $) pq~PH — P*q + Qtp + tp^tr 

Suppose that p -£ 0 and q ^ 0; then we can divide (18) by pq to obtain the relative 
error in the product pq : 

(! 9 ) r = pq ~ M = + % e p + *p*<! = PU , 

pq pq pq pq pq pq ’ 

Fuitherfnoie. suppose that p and q are good approximations for p and q: then 
pfp ** 1,?/$ ^ L and R p R q — {e P i p){e q {q) ^ 0 (R p and R q are the relative errors 
in ifie approximations p and q). Then making these substitutions into (19) yields no 
simplified relationship 


m 


pq - pq 
pq 


^ ^ + 0 =R+R 

q p 


This Shows that the relative error in the product pq is approximately the sum of the 
relative errors in the approximations p and q. 

Often an initial error will be propagated in a sequence of calculations. A quality 
fftttf is desirable for any numerical process is that a small error in the initial conditions 
will produce small changes in the final result. An algorithm with this feature is called 
otherwise, it is called unstable. Whenever possible we shall choose methods 
that are stable, i he following definition is used to describe the propagation of error. 


Definition 1.13, Suppose that e represents an initial error and e(n) represents the 
growth of he enor after* steps. If tt(*)| ^ *e, the growth of error is said to be linear. 
If k \n)\ m . the growth of error is called exponential. If K > l, the exponential 
error grows without bound as n -* oc, and if 0 < K < 1, the exponential error 
din.mishes to zero as n —► oc. a 

nett two examples show how an initial error can propagate in either a stable 
or an unstable fashion. In the first example, three algorithms are introduced. Each 
algorithm recursively generates the same sequence. Then, in the second example, small 
changes will be made to the initial conditions and the propagation of error will be 

analyzed. 
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Table 1.4 The Sequence {x„ J = {1 /3"} and the Approximations {r n { p n } and [q n } 


Table 1.5 The Error Sequences {*„ - r„ ], (jc„ - p „}, and [x n — q „} 


n 

Xfl 

1 fn 

Pn 


0 

1 = 1.0000000000 

0,9999600000 

1.0000000000 

1.0000000000 

1 

1=0.3333333333 

0.3333200000 

0.3333200000 

0.3333200000 

2 

£=0.1111111111 

0.1111066667 

0.1110933330 

0.1110666667 

3 

=0.0370370370 

0.0370355556 

0.0370177778 

0.0369022222 

4 

^=0.0123456790 

0.0123451852 

0.0123259259 

0.0119407407 

5 

^-0.0041152263 

0,00411506J7 

0.0040953086 

0.0029002469 

6 

^=0.0013717421 

0.0013716872 

0,0013517695 

- 0.0022732510 

7 

2^=0.0004572474 

0.0004572291 

0.0004372565 

-0.0104777503 

8 

^=0.0001524158 

0.0001524097 

0.0001324188 

-0.0326525834 

9 

=0.0000508053 

0.0000508032 

0.0000308063 

-0.0983641945 

10 

. 55^-0,0000169351 

: 0.0000169344 

i____ 

-0.0000030646 

-0.2952280648 


Example 1.24, Show that the following threfc schemes can be used with infinite-precision 
arithmetic to recursively generate the terms in the sequence {1/3” }£l 0 . 


fi 

1 

Xn - r n 

- Pn 

An — q n 

0 

0.0000400000 

0.0000000000 

0.0000000000 

1 

0.0000133333 

0.0000133333 

0.0000013333 

2 

0.0000044444 

0.0000177778 

0.0000444444 

3 

0.0000014815 

0.0000192593 

0.0001348148 

4 

0.0000004938 

0.0000197531 

0.0004049383 

5 

0.0000001646 

0.0000199177 

0.0012149794 

6 

0.0000000549 

0.0000199726 

0.0036449931 

7 

0.0000000183 

0.0000199909 

0.0109349977 

8 

0.0000000061 

0.0000199970 

0.0328049992 

9 

0.0000000020 

0.0000199990 

0.0984149998 

10 

0.0000000007 

0.0000199997 

0.2952449999 


equafion has the general solution q„ = A{l/3 n ) +53*. This too is verified by substitution: 

- (?? - ij) - - 


= A- + B3”=&. 

A =. 1 and 5 = 0 generates the required sequence. 


(21a) 

(21b) 

(21c) 


ro = i and 
PO = 1. pi = y 
90 = y 


1 

r " ~ 3 r "-' 

, 4 1 

and p n = -p*_i - -p n -2 

. 10 

and q n - y9fl-i - 9n-2 


forn = 1,2,..., 
for n = 2, 3,..., 
torn = 2,3. 


Formula (21a) is obvious. In (21b) the difference equation has the general solution p. = 
A(l/3 n ) + B. This can be verified by direct substitution: 


4 

jPn-l 


I 4 



^ _ A ~ (I - 1) B = + s - p.. 


Setting A = 1 and 5=0 will generate the desired sequence. In (21c) the difference 


Example 1,25. Generate approximations to the sequence {*„} = {1/3*} using the 
schemes 



r 0 = 0.99996 and r n 

= 3 r "- 1 


for n = 1, 

2, 

(22b) ■ 

P0 = 1, pi =0.33332, 

and p n = -Pfl-i 

1 

~ ^Pn-2 

for n = 2, 

3, 

(22c) 

?o = l-9i =0.33332, 

i 10 

and q„ = —q n - 1 

i - q n ~2 

for n = 2, 

3, 


In (22a) the initial error in ro is 0.00004, and in (22b) and (22c) the initial errors in p\ 
ind q\ are 0.000013. Investigate the propagation of error for each scheme. 

Table 1.4 gives the first ten numerical approximations for each sequence, and Table 1.5 
gives the error in each formula. The error for {r „} is stable and decreases in an exponential 
manner. The error for { p n } is stable. The error for {q n \ is unstable and grows at an expo¬ 
nential rate. Although the error for [p„] is stable, the terms p n 0 as n —* oo, so that 
Ihe error eventually dominates and the terms past p% have no significant digits. Figures 1.8, 
1,9, and 1.10 show the errors in (r„), {p n }, and [q n ], respectively. ■ 
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Figure 1.8 A stable decreasing error sequence (x„ - r„). 


0.000020 

0.000015 

0.000010 

0.000005 


Figure 1.9 A stable error sequence {x„ - p n }. 



Uncertainty in Data 

Data from real-world problems contain uncertainty or emor. This type of error is re¬ 
ferred to as noise. It will affect the accuracy of any numerical computation that is based 
on the data. An improvement of precision is not accomplished by performing succes¬ 
sive computations using noisy data. Hence, if you start with data with d significant 
digits of accuracy, then the result of a computation should he reported in d significant 
digits of accuracy. For example, suppose that the data pi =4,152 and pi = 0.07931 
both have four significant digits of accuracy. Then it is tempting to report all the digits 
that appear on your calculator (i.e., p\ + pi =4,23131). This is an oversight, because 



Figure 1.10 An unstable increasing error sequence {x„ - q„\. 
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you should not report conclusions from noisy data that have more significant digits 
than the original data. The proper answer in this situation is pi + pi = 4.231. 


Exercises for Error Analysis 

1. Find the error E x and relative error R x . Also determine the number of significant 
digits in the approximation. 

(a) x = 2.71828182, 7= 2.7182 

(b) y = 98, 350. y = 98, 000 

(c) z =0.000068,7 = 0.00006 

2. Complete the following computation 


fV* 2 f l/4 / „ X 2 X 6 \ 

l + 


dx = p. 


i this situAtim 


true value p = 0.2553074606. 

3. (a) Consider the data p\ = 1.414 and p 2 = 0.09125, which have four significant 

digits of accuracy. Determine the proper answer for the sum pi + P 2 and the 
product pi p 2 . 

(b) Consider the data pi = 31.415 and pz = 0.027182, which have five significant 
digits of accuracy. Determine the proper answer for the sum pi + pz and the 
product p\ p 2 . 

4. Complete the following computation and state what type of error is present in this 
situation. 

sin{f+0.00001)-sin (f) 0.70711385222 -0.70710678119 

a o.ooooi “ o.ooooi 

ln(2 + 0.00005) - ln{2) _ 0.69317218025 - 0.69314718056 
1 } 0.00005 0.00005 - "' 

5. Sometimes the loss of significance error can be^avoided by rearranging terms in the 
function using a known identity from trigonometry or algebra. Find an equivalent 
formula for the following functions that avoids a loss of significance. 

(a) ln(x + I) - ln(x> for large x 

(b) 'Jx 1 + 1 - x for large x 

(c) c os 2 (x) - sin 2 (x) for x jt/4 

(d) y 1 for x % jt 

6. Polynomial Evaluation . Let P(x) = x 3 - 3x 2 +3x - 1, Q(x) = ((x - 3)x + 3)x - 1, 
and R(x) = (x - ]) J . 

(a) Use four-digit rounding arithmetic and compute P(2.72), Q(2.12), and R (2.72). 
In the computation of P(x), assume that (2.72) 3 = 20,12 and (2.72) 2 = 7.398. 
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(b) Use four-digit rounding arithmetic and compute P(0.975), g(0.975), and 
RfO.975), Jn the computation of P(x), assume that (0.975) 3 = 0.9268 and 
(0.975) 2 = 0.9506. 

7. Use three-digit rounding arithmetic to compute the following sums (sum in the given 
order): 

(a) X3 Ll jf < b ) LLi 

8. Discuss the propagation of error for the following: 

(a) The sum of three numbers: 

p + q + r = (p + t p ) + (jq + € q ) + (r + € r ), 

<b) The quotient of two numbers: — =: ^, + fp . 

9 H+U 

(c) The product of three numbers: 

pqr - (p + € p )<$ + f q )(r +€r)- 

9. Given the Taylor polynomial expansions 

-J_ *= l +h + h 2 + h 3 + 0(h 4 ) 

1 — n 

and 

cos(A) = 1 - ^ ^ + Oih 6 ). 

Determine the order of approximation for their sum and product. 

10, Given the Taylor polynomial expansions 

, h 2 k 3 A 4 ■= 

^»l + * + - + _ + _ + 0 (»», 

and 

sin (h)^h-^ + 0(h 5 ). 

Determine the order of approximation for their sum and product. 

11. Given the Taylor polynomial expansions 

1,2 1,4 

cos( A) = 1 - — + — + 0(h 6 ) 
and 

si«(h) = k-~ + j; + 0<.h 7 ). 

Determine the order of approximation for their sum and product. 


Sec. 1.3 Error Analysis 


39 


12. Improving the Quadratic Formula. Assume that a ^ Qandb 2 4ac > 0 and consider 
the equation ax 2 + bx +c = 0. The roots can be computed with the quadratic formulas 

... —b -(- \ib 2 - 4ac — b — Vh 2 — 4ac 

(i) xi = --- and x 2 = ---. 

2 a 2 a 

Show that these roots can be calculated with the equivalent formulas 

.... -2c -2c 

00 xi = - - and X 2 = - . . 

b + Vb 2 — 4ac b — vp“ — 4ac 

Hint. Rationalize the numerators in (i). Remark. In the cases when |A| & VV — 4cic, 
one must proceed with caution to avoid loss of precision due to a catastrophic can¬ 
cellation. If b > 0, then x\ should be computed with formula (ii) and x 2 should be 
computed using (i). However, if b < 0, then xj should be computed using (i) and x 2 
should be computed using (ii). 

13. Use the appropriate formula for X\ and X 2 mentioned in Exercise 12 to find the roots 
of the following quadratic equations. 

(a) x 2 - 1,000.00 lx + 1 = 0 

(b) x 2 - 10,000.0001x + 1 =0 

(c) x 2 - 100,000.00001 x + 1 =0 

(d) x 2 - t,000,000.000001x + 1 = 0 


Algorithms and Programs 

1. Use the results of Exercises 12 and 13 to construct an algorithm and MATLAB pro¬ 
gram that will accurately compute the roots o f a quadra tic equation in all situations, 
including the troublesome ones when \b\ ^ jb 2 — 4 ac. 

2. Follow Example 1,25 and generate the first ten numerical approximations for each 
of the following three difference equations. In each case a small initial error is in¬ 
troduced. If there were no initial error, then each of the difference equations would 
generate the sequence {1/2"]^. Produce output analogous to Tables 1.4 and 1 .5 and 
Figures 1.8, 1.9, and 1.10. 

(a) r 0 = 0.994 and r„ = , for n = 1, 2,... 

(b) po = 1, pi = 0.497, and p n = §p n _i - p n - 2 , for n = 2, 3,... 

(c) <?o = l, q\ = 0.497, and q„ — \q n -1 — q n - 2 , forn = 2, 4,... 
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Consider the physical problem that involves a spherical ball of radius r that is sub¬ 
merged to a depth d in water (see Figure 2.1). Assume that the ball is constructed from 
a variety of longleaf pine that has a density of p — 0.638 and that its radius measures 
r = 10 cm. How much of the ball will be submerged when it is placed in water? 

The mass M w of water displaced when a sphere is submerged to a depth d is 

Jo 3 

and die mass of the ball is Mb = Anr^p/l, Applying Archimedes’ law M w = Mb, 
produces the following equation that must be solved: 

jr(rf 3 - 3d 2 r + 4r 3 />) 



Figure 2,1 The portion of a 
sphere of radius r that is to be sub¬ 
merged to a depth d. 


In our case (with r = 10 and p = 0.638) this equation becomes 
*(2552- 3Qd 2 + d 3 ) Q 

The graph of the cubic polynomial y = 2552 - 30 d 2 4- d? is shown in Figure 2.1 
from it one can see that the solution lies near the value d = 12 . 

The goal of this chapter is to develop a variety of methods for finding numt 
approximations for the roots of an equation. For example, the bisection method < 
be applied to obtain the three roots d\ = —8,17607212, di = 11.86150151 
— 26.31457061. The first root d\ is not a feasible solution for this problem, be* 
d cannot be negative. The third root 03 is larger than the diameter of the sphere ; 
is not the desired solution. The root ^2 = 11.86150151 lies in the interval [0, 20 
is the propeT solution. Its magnitude is reasonable because a little more than otic 
of the sphere must be submerged. 


2.1 Iteration for Solving x — g( x) 

A fundamental principle in computer science is iteration. As the name sugge 
process is repeated until an answer is achieved. Iterative techniques are used t* 
roots of equations, solutions of linear and nonlinear systems of equations, and soh 
of differential equations. In this section we study the process of iteration using rej 
substitution. 

A rule or function g(x) for computing successive terms is needed, together ’ 
starting value po* Then a sequence of values {p k } is obtained using the iterativ 
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P& + \ = gf.pt)- The sequence has the pattern 

po (starting value} 

P\ ~ g(P0) 

P2 = g{p\) 

( 1 ) : 

Pk = g\Pk-l) 

Pk — \ = gipk) 

What can we learn from an unending sequence of numbers? If the numbers rend 
to a limit, we feel that something has been achieved. But what if the numbers diverge 
or are periodic? The next example addresses this situation. 

Example 2.1. The iterative rule pa = 1 and p*+i = I 00ip* for k =* 0, 1_produces 

a divergent sequence. The first 1 (JO terms look as follows: 

p\ = l.ooipo - (i.oono. 000000 )= 1 , 001000 , 

Pi = 1.001 pi = (1.00l)(l.001000)- J.002001, 

P3= 1.001 P2 =(1.001)0.002001)= 1.003003, 

pioo = 1.00lp99= (1.001)0-104012)= 1.105:16. 

The process can be continued indefinitely, and it is easily shown that lim n _^, p n = -tc. 
In Chapter 9 we will see that the sequence {/>*J is a numerical solution to the differential 
equation y' = 0.001 y. The solution is known to be >■(*) = e oootjc , Indeed, if we compare 
the 100th term ;n the sequence with y(t00), we see that pm = 1 105116 ^ 1.105171 = 
e- 0 - 1 = yOOO). . 

In this section we are concerned with the types of functions g(jt) that produce 
convergent sequences ip*]. 

Finding Fixed Points 

Definition 2.1 (Fixed Point). A fixed point of a function gU ) is a real number P 
such that P = g(P). a 

Geometrically, the fixed points of a function y = g(x) are the poinls of intersection 
of y = g(x ) and y — x. 


Definition 2.2 (Fixed-point Iteration). The iteration p„ + \ = g{p n ) for n - 0, 
I,... is called fixed-point iteration. a 
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Theorem 2,1. Assume that g is a continuous function and that { p n 0 is a sequence 
generated by fixed-point iteration. If iiin^a- p n = P, then P is a fixed point of gtiv). 

ffoqf. Ii lim^^ p n = p , then lim^^ p n+i = p. It follows from this result, the 
continuity of g, and ihe relation p nH . \ = g (p„) that 

(£1 $(/’) = ,?( lim />„) = lim g(p r .) ~ lim p n ~\ - P. 

/ >i—*-oc ° n-f oc 

TVe refer®. Pita fixed point of g(x). * 

E xampl&2.2. Consider the convergent iteration 

po = 0-5 and pi, + ] =e ~ p * for * = 0.1..,. 

■niefwt 10 terms are obtained by the calculations 

p ]=e -o.sooooo — 0 . 006531 

P2 = e -0 **’ 531 = 0.545239 
p 2 = e~ 0 - 545239 = 0.579703 

P9 = ^- 0-^09 = 0.567560 
pro - e~ 0 - 567560 = 0.5 56907 

The sequence is converging, and further calculations reveal that 

lim p n = 0.567143 .... 

n—*ce 

Thu t we have found an approximation for the fixed point of the function y = e~ x . a 

The following (wo theorems establish conditions for the existence of a fixed point 
Sod the convergence of the fixed-point iteration process to a fixed point. 

Theorem*^. Assume that g e C\a, b\. 

<3) ff the range of the mapping y = g(x) satisfies y e [a. b\ /or all x c (a, b\, then 
y hU a fixed point in (a, b}. 

C4) Furthermore, suppose that g r U) is defined over (a, b ) and that a positive constant 
A < 1 exists with |g'(-*)l < K < 1 for all x e (a, b), then g has a unique fixed 
point P in [a, b}. 
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Proof of (3). If g{a) — a or g(b) = b, the assertion is true. Otherwise, the values 
of g{a) and gib) must satisfy g(a) e (a, b] and g(b) e [a, b). The function f(x) = 
x — g{x) has the property that 


fia) = a~ g(a) < 0 


f(b) = b - g(b) > 0. 


Now apply Theorem 1.2, the Intermediate Value Theorem, to /(a), with the constant 
L = 0, and conclude that there exists a number P with P e (a, b) so that f(P) ~ 0, 
Therefore, P = g(P) and P is the desired fixed point of g(x). 

Proof of (4). Now we must show that this solution is unique. By way of contradic¬ 
tion, let us make the additional assumption that there exist two fixed points P[ and Pj. 
Now apply Theorem 1.6, the Mean Value Theorem, and conclude that there exists a 
number d € (a, b) so that 


g'(d) = 


g(Pi) - g(Pi) 
P 2 -P i 


Next, use the facts that g(P j) = P j and g(Pi) = Pi to simplify the right side of 
equation (5) and obtain 

But this contradicts the hypothesis in (4) that |g r (x)| < 1 over ( a , b), so it is not 
possible for two fixed points to exist. Therefore, g(j) has a unique fixed point P 
in [ a , b ] under the conditions given in (4). * 


Example 2.3. Apply Theorem 2.2 to rigorously show that g(x) = cos(x) has a unique 
fixed point in fO, 1]. 

Clearly, g e C(0, 1]. Secondly, g(x) = cos(x) is a decreasing function on [0, 1], thus 
itsrangeon [0, 1] is [cos(l), 1J c [0, 1]. Thus condition (3) of Theorem 2.2 is satisfied and 
g has a fixed point in [0, 1J. Finally, if * e (0, I), then |g'0)| = | - sin(x)| = sin(x) < 
sin(l) < 0,8415 < 1. Thus K = sin(I) < 1, condition (4) of Theorem 2.2 is satisfied, and 
g has a unique fixed point in 10,1]. ■ 

We can now state a theorem that can be used to determine whether the fixed-point 
iteration process given in (1) will produce a convergent or divergent sequence. 


Theorem 2,3 (Fixed-point Theorem). Assume that (i) g, g' € C[a, 6], (ii) K is a 
positive constant, (iii) po e (a, b ), and (iv) g(x) e [a,b] for all x e [a , b). 

(6) If |g'(jc)| < K < 1 for all x e then the iteration p n = g(p n - 1 ) will 

converge to the unique fixed point P e [a, b]. In this case, P is said to be an 
attractive fixed point. 

(7) If |g'(.r)| > I for all x e [a,b], then the iteration p n = g(p n - 1 ) will not 
converge to P. In this case, P is said to be a repelling fixed point and the iteration 
exhibits local divergence. 
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a Pi P Po b 

Figure 2.3 The relationship among P, po. Pi, \P — pol, 
and |P — pi\. 


Remark 1. It is assumed that po P in statement (7). 
Remark 2. Because e is continuous on an interval nont 


! containing P, it is permissible to use 


the simpler criterion |g'(F)| < K < 1 and | g'(P)\ > 1 in (6) and (7), respectively. 
Proof. We first show that the points all lie in (a, b). Starting with p 0 , we 

apply Theorem 1.6, the Mean Value Theorem. There exists a value co e (a, b ) so that 

(8) \P ~ P\ \ — \g{P) ~ 8(Po)\ = \g'(c 0 )iP - po)\ 

= ~ Pftl < K[P - <\P - p^. 

Therefore, p\ is no further from P than p 0 was, and it foiiows that pi e (a, b) (see 
Figure 2.3). In general, suppose that p n -\ € (a, b)\ then 

(9) ^ Pn\- is(^) - g(Pn- l)l = \g'(c n -i)(P - p n -i)\ 

= \g'{Cn l) \\P~Pn-l \ < K\P-p n _i\ < \P-p n ^il 

Therefore, p n e (a, b) and hence, by induction, all the points {p H }£L 0 lie in {a. b). 

To complete the proof of (6), we will show that 

(10) lim [P — pj = 0. 

n—*oo 

First, a proof by induction will establish the ineaualitv 


\P-p»\<K n \P-po\ 


The case n - 1 follows from the details in relation (8). Using the induction hypothesis 
| P — p n -\\ < K n ~ 1 \P — pol and the ideas in (9), we obtain 

- P«\ < K)P ~ Pn- ll £ KK"-' IP - po\ = K n \P - pal 

Thus, by induction, inequality (11) holds for all n. Since 0 < K < 1, the term K” 
goes to zero as n goes to infinity. Hence 

(12) 0< lim \P-p„\< lim K n \P - Po \ = 0. 

n—*Oo tz —^ exj 

The limit of f P — p n | is squeezed between zero on the left and zero on the right, so we 
can conclude that lim^oo I P - p„\ = 0. Thus lim„^ w p n - P and, by Theorem 2.1, 
the iteration p n = g{p n -\) converges to the fixed point P. Therefore, statement (6) of 
Theorem 2.3 is proved. We leave statement (7) for the reader to investigate. • 
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Figure 2.4 (a) Monotone convergence when 0 < g'(P) < i. 


y 



Figure 2.5 (a) Monotone divei- 
gence when 1 < g'(P). 


y 



y 



Figure 2.5 (b) Divergent oscitla 
tion when g'(P ) < —1. 


Figure 2.4 (b) Oscillating convergence when —1 < g'(P) < 0, 


Corollary 2.1. Assume that g satisfies the hypothesis given in (6) of Theorem 2.3. 
Bounds for the error involved when using p n to approximate P are given by 


(13) 

\P-Pn\<K n \P-po\ 

for all n > 1, 

and 



(14) 

1 * 
5 i 
* 

VI 

£ 

i 

a. 

for all n > 1. 


Graphical Interpretation of Fixed-point Iteration 

Since we seek a fixed point P to g(x), it is necessary that the graph of the curve 
y — #U) and the line y — x intersect at the point (P, P). Two simple types of 
convergent iteration, monotone and oscillating, are illustrated in Figure 2.4(a) and (b), 
respectively. 

To visualize the process, start at po on the x-axis and move vertically to the point 
(po, P i) = (po. g(po)) on the curve y = g(x). Then move horizontally from (po. Pi) 
to the point (pi, pi) on the line y ~ x. Finally, move vertically downward to pi on 
the x-axis. The recursion p n+ ] = g(p n ) is used to construct the point (p„, p H +i) on 
the graph, then a horizontal motion locates (p n+1 , p„ + i) on the line y = x, and then a 
vertical movement ends up at p n +i on the x-axis. The situation is shown in Figure 2.4. 
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If \g'(P)\ > 1, then the iteration p n+] = g(p a ) produces a sequence that diveiges 
away from P. The two simple types of divergent iteration, monotone and oscillating, 
are illustrated in Figure 2.5(a) and (b), respectively. 


Example2.4. Considerthe iteration p n+] = g{p n ) when the function g(x) ~ 1 ~frjr — jt 2 /4 
is used. The fixed points can be found by solving the equation Jt = g(x). The two solutions 
(fixed points of g) are x — —2 and x = 2. The derivative of the function is g'(x) = 1 —xj2, 
and there are only two cases to consider. 


Case (i); P = - 2 

Stan with Pq = — 2.05 

then get Pi= ~ 2.100625 

P2 = - 2.2037S135 

p 3 = -2.41794441 

lira p„ = - oo. 

n—►do 

Since |g'(r)j > j on 1-3, -1], by The¬ 
orem 2.3, the sequence will not converge 
to P — -2. 


Case(ii): p = 2 

Start with po = I.6 

then get p ( = 1.96 

p2 = 1.9996 
Pi — l 99999996 

lira ^ —2. 

/I—*00 

Since |g'(je)| < \ on [1,3], by Theo¬ 
rem 2,3. the sequence wii! converge to 
P- 2. 


Theorem 2.3 does not state what will happen when g'(P) — 1, The next example 
has been specially constructed so that the sequence {p„} converges whenever po > P 
and it diverges if we choose po < P 


Example ZS. Consider the iteration p n+ 1 — g(p n ) when the function g(r) = 2(x -l) 1 / 2 
for jt > 1 is used. Only one fixed point P = 2 exists. The derivative is g'(x) = l/(x -1) ,/2 
and g*( 2) = t, so Theorem 2,3 does not apply. There are two cases to consider when the 
starting value lies to the left or right of P = 2. p 

Case (Q.-Startwithpo Cose (tt): Startpo =2.5, 

then get p\ = 1.41421356 then get p[ =2.44948974 

p2 — 1.28718851 p2 =2.40789513 

p-i = 1.07179943 p 3 = 2.37309514 

p 4 = 0.53590832 p A = 2.34358284 

p 5 =2(-0,46409168) I/2 . Jim p n = 2. 

n-K3o 

Since p 4 lies outside the domain of This sequence is converging too slowly 

£(*), the term p 5 cannot be computed. to the value P = 2; indeed, Pjooo = 

2.00398714. 
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Absolute and Relative Error Considerations 

In Example 2.5, case (ii), the sequence converges slowly, and after 1000 iterations the 
three consecutive terms are 

p,000 = 2.00398714, Pim = 2.00398317, and p 1002 = 2.00397921. 

This should not be disturbing; after all, we could compute a few thousand more terms 
and find a better approximation! But what about a criterion for stopping the iteration? 


Fiwii 


= 12.00398317 — 2 . 00397921 ! = 0 . 0000030 * 


Yet the absolute error in the approximation piooo is known to be 

f P - Pioool = 12.00000000 - 2.003987141 = 0.00398714. 

This is about 1000 times larger than Ipiooi - p 1002 1 and it shows that closeness of 
consecutive terms does not guarantee that accuracy has been achieved. But it is usually 
the only criterion available and is often used to terminate an iterative procedure. 

Program 2.1 (Fixed-Point Iteration), To approximate a solution to the equation 
;c = g(x) starting with the initial guess p 0 and iterating p „+1 — g(p n )- 


function [k,p,err,P]^fixptCgjpO,tol,maxl) 

% Input - g is the iteration function input as a string ’g’ 

% - pO is the initial guess for the fixed point 

% - tol is the tolerance 

X - maxi is the maximum number of iterations 

^Output * k is the number of iterations that were carried out 

% - p is the approximation to the fixed point 

% - err is the error in the approximation 

% - P contains the sequence {pn} 

PU)= p0; 
for k B 2:maxl 

P (k) eval (g ,P tk-1) ) ; 
err«abs(P(k)-P(k-l)); 
relerr=err/(abs(P(k))+eps); 
p=P(k); 

if (err<tol) I (relerr<tol) .break: end 

end 

if k == maxi 

dispC'maximum number of iterations exceeded') 

end 
P-P ’; 

.Remark, When using the user-defined function fixpt, it is necessary to input the 
M-file g.m as a string: ’g’ (see MATLAB Appendix). 
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Exercises for Iteration for Solving x = g(x) 

1. Determine rigorously if each function has a unique fixed point on the given interval 
(follow Example 2.3). 

(a) g(x) = 1-* 2 /4on [0, 1] 

(b) g(x) = 2 _r on [0, l] 

(c) g(x)= l/*on[0.5,5.2] 

2. Investigate the nature of the fixed-point iteration when 

g(x) = -4 + 4 a- - ij; 2 , 

(a) Solve g(x) = x and show that P = 2 and P = 4 are fixed points. 

(b) Use the starting value p 0 = 1.9 and compute p\, p 2 , and pj. 

(c) Use the starting value po — 3.8 and compute p i, p 2 , and p$. 

(d) Find the errors E k and relative errors /?* for the values p* in parts (b) and (c). 

(e) What conclusions can be drawn from Theorem 2.3? 

3. Graph g(x), the iine y = x, and the given fixed point P on the same coordinate 
system. Using the given starting value po, compute p\ and po- Construct figures 
similar to Figures 2.4 and 2.5. Based on your graph, determine geometrically if fixed- 
point iteration converges. 

(a) g(x) = (6 -I- x) l/2 , P = 3, and po = 7 

(b) g(x) = I + 2/x t P = 2, and po =4 

(c) g(x) = x 2 /3, P = 3, and p 0 = 3.5 

(d) g(x) — -x 2 + 2x + 2, P = 2, and po = 2.5 

4. Let g(x) = x 2 +x -4. Can fixed-point iteration be used to find the solution(s) to the 
equation x = g(x)? Why? 

5. Let g(jc) = x cos(x). Solve x = g(x) and find all the fixed points of g (there are in¬ 
finitely many). Can fixed-point iteration be used to find the solution(s) to the equation 
x = g(x)l Why? 

6. Suppose that g(x) and g'(jr) are defined andcontinuouson (a, b); p 0 , p\,pj € ( a,b)\ 
and pi = g(po) and p 2 = g(pi). Also, assume that there exists a constant K such 
that jg'Cx)l < K. Show that |/? 2 - pij < K\p\ - pof. Hint, Use the Mean Value 
Theorem. 

7. Suppose that g(x) and g'(x) are continuous on ( a,b ) and that |g'(x)| > 1 on this 
interval. If the fixed point P and the initial approximations po and p\ lie in the interval 
(a,b), then show that pi = g(p 0 ) implies that \E\\ = \P - p\\ > \P - p 0 f = |£o|. 
Hence statement (7) of Theorem 2.3 is established (local divergence). 

8. Let g(x) = — O.OOOlx 2 + x and p 0 = 1, and consider fixed-point iteration. 

(a) Show that po > pi > ■ - ■ > p n > Pn+t > 

(b) Show that p n > 0 for all n. 
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(c) Since the sequence {p n } is decreasing and bounded below, it has a limit. What 
is the limit? 

9. Let g(x) = 0.5x + 1.5 and po = 4, and consider fixed-point iteration. 

(a) Show that the fixed point is P = 3. 

(b) Show that \P — p n \ = \ P — p„_ j |/2 for n = 1,2, 3,... 

(c) Show that \P — p n \ = \P - pof/2 n for n = 1, 2, 3,_ 

18. Let g{x) — x /2, and consider fixed-point iteration. 

(a) Find the quantity \p k+ \ - Pi|/!p*+i |, 

(b) Discuss what will happen if only the relative error stopping criterion were used 
in Program 2.1, 

11. For fixed-point iteration, discuss why it is an advantage to have g f (P) a* 0. 

Algorithms and Programs _ 

1. Use Program 2.1 to approximate the fixed points (if any) of each function. Answers 
should be accurate to 12 decimal places. Produce a graph of each function and the 
line y = x that clearly shows any fixed points. 

(a) g(x) = x 5 - 3x 3 - 2x 2 + 2 

(b) g(x) = cos (sin (x)) 

(c) g(x) = x 2 - sin(x + 0,15) 

(d) g{x) =x x - cos ^ 


,2 Bracketing Methods for Locating a Root 

Consider a familiar topic of interest. Suppose that you save money by making regular 
monthly deposits P and the annual interest rate is /; then the total amount A after N 
deposits is 

( i ) a = p + p(i + 4')+p(i + E) + ... + / >( 1 + - L ) . 

The first term on the right side of equation (1) is the last payment. Then the next-to-last 
payment, which has earned one period of interest, contributes P (l + ^). The second- 

from-last payment has earned two periods of interest and contributes P (l + ^) 2 , and 
so on. Finally, the last payment, which has earned interest for N —1 periods, contributes 

P (l + -pi) " toward the total. Recall that the formula for the sum of the N terms of 
a geometric series is 


L + r + r- + r 3 + ■ 


■ +r Af_1 = 


( 2 ) 


1 -r 



We can write (1) in the form 


" = p 1+ 1 + nJ + 1 


+ ■■■+ 1 + 


and use the substitution r = (1 -f- //12) in (2) to obtain 

1 -O+li) 

This can be simplified to obtain the annuity-due equation, 


(3) ^77iiii 1+ w ~ l y 

The following example uses the annuity-due equation and requires a sequence of 
repeated calculations to find an answer. 


all payments and interest is $250,000 at the end of the 20 years. What interest rate 7 is 
needed to achieve your goal? If we hold N = 240 fixed, then A is a function of 7 alone; 
that is A = A (7), We will start with two guesses, 7q = 0.12 and /[ — 0.13, and perform a 
sequence of calculations to narrow down the final answer. Starting with 7 0 = 0.12 yields 

Since this value is a little short of the goal, we next try 7| = 0.13: 

This is a little high, so we try the value in the middle 1% = 0,125: 


This is again high and we conclude that the desired rate lies in the interval [0.12, 0.12: 
The next guess is the midpoint 7$ = 0.1225: 


— 

This is high and the interval is now narrowed to [0.12, 0.1225]. Our last calculation uses 
the midpoint approximation / 4 = 0.12125: 


250 // 0-12125 

“ 25 > = <n2T257T2 ( 1 + — 


1 =251,518. 
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(a) If f(a) and/(c) have 
f opposite signs then 
squeeze from the right. 

Figure 2.6 The decision process for the bisection process. 


(b) If/(c) and f{b) have 
opposite signs then 
squeeze from the left. 


Further iterations can be done to obtain as many significant digits as required. The 
purpose of this example was to find the value of 7 that produced a specified level L of the 
function value, that is to find a solution to A(7) = L. It is standard practice to place the 
constant L on the left and solve the equation A(7) - L = 0. ■ 

Definition 2.3 (Root of an Equation, Zero of a Function). Assume that fix) is a 
continuous function. Any number r for which fir) = 0 is called a root of the equation 
fix) — 0. Also, we say r is a zero of the function fix). A 

For example, the equation 2x 2 + 5jc - 3 = 0 has two real roots n = 0.5 and 
r 2 = -3, whereas the corresponding function / (*) = 2* 2 + 5* - 3 = (2x - 1) (jc -f 3) 
has two real zeros, n = 0.5 and r 2 = — 3. 


The Bisection Method of Boizano 

In this section we develop our first bracketing method for finding a zero of a continuous 
function. We must start with an initial interval [a,b], where f(a) and fib) have 
opposite signs. Since the graph y = f(x) of a continuous function is unbroken, it will 
cross the x-axis at a zero x = r that lies somewhere in the interval (see Figure 2.6). The 
bisection method systematically moves the end points of the interval closer and closer 
together until we obtain an interval of arbitrarily small width that brackets the zero. 
The decision step for this process of interval halving is first to choose the midpoint 
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c = (a + b)f 2 and then to analyze the three possibilities that might arise: 

(4) If f{a) and f(c) have opposite signs, a zero lies In {a, c], 

(5) If f(c) and /(£0 have opposite signs, a zero lies in [c, b], 

(6) if fir) = 0, then ihe zero is c. 

If either case (4i or 15) occurs, we have found an interval half as wide as the original 
interval that contains the root, and we are “squeezing down on it” (see Figure 2.6). To 
continue ihe process, relabel the new smaller interval \u, b) and repeat the process until 
the interval is as small as desired. Since the bisection process involves sequences of 
nested intervals and their midpoints, we will use the following notation to keep track 
of the details in the process: 


\a {) , bo j is the starting interval and cp = fi|J ^ is Ihe midpoint. 

[ffi, £]] is the second interval, which brackets the zero r, aneci is its midpoint; 

( 7 ) the interval [«i. b\] is half as wide as [ao< ho|. 

After arriving at the r?th interval [a n + b n ], which brackets r and has midpoint 
c K . the interval ftfn+i, /wil is constructed, which also brackets r and is half 
as wide as [a n , b n \. 

It is left as an exercise for the reader to show that the sequence of left end points is 
increasing and the sequence of right end points is decreasing; that is, 

(8) ao i 5 5 r <■■■<()«<■■•< < £(>, 

where t\, - and if f(a n ~\)f{bn + \) < 0, then 

(9j b n +\] = [a„, c n \ or [cr„+], b n ul = Lc«. 6*1 for all n. 


Theorem 2.4 (Bisection Theorem). Assume that / e C[a, b] and that there exists 
a number r e \a,b\ such that fir) = 0. If f{a) and fib) have opposite signs'and 
represents the sequence of midpoints generated by the bisection processof (8) 
and (9), then 


-j- for n =? 0, 1. 


and therefore the sequence (c n ]^, 0 converges to the zero x = r; that is, 


lim c n = r. 


Proof. Since both the zero r and the midpoint c n lie in the interval [a„, b n \, the dis¬ 
tance between c n and r cannot be greater than half the width of this interval (see Fig¬ 
ure 2.7). Thus 


\r - c n \ < 



( 12 ) 


for all n. 
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Figure' 2.7 The root r and raidnoint c„ of [a„ , b„] for the 
bisection method. 



Example 2.7. The function h{x) = x sin(jt) occurs in the study of undamped forced 
oscillations. Find the value of x thal lies in the interval [0, 2], where the function takes on 
die value h(x) = f (the function sin(.r) is evaluated in radians). 

We use the bisection method to find a zero of the function f{x) = x sinU) 1. Starting 
with = 0 and ho = 2, we compute 

/(()) = -1,000000 and /{2) = 0.818595, 

so a root of f(x) = 0 lies in the interval [0. 2|. At the midpoint cq = 1, we find that 
/(1) = -0.158529. Hence the function changes sign on ko, bo] = [1,2], 

To continue, we squeeze from the left and set a\ = c 0 and b\ = b^. The midpoint 
is ci = 1.5 and f(ci) = 0.496242. Now, /(l) = -0.158529 and /(1.5) = 0.496242 
imply that the root lies in the interval [ci. ci ] = 11.0, 1,5J. The next decision is to squeeze 
from the right and set aj = ai and b 2 = c|. In this manner we obtain a sequence {cjt) that 
converges to r^l. 114157141. A sample calculation is given in Table 2.1. ■ 






Table 2.1 Bisection Method Solution of x sin(jc.) -1=0 


k 

Led 

end paint, at 

Midpoint, c* 

Right 

end point, A* 

Function v*lue, 

/to) 

0 

0 

1 . 

2. 

-0.158529 

l 

1.0 

1.5 

2.0 j 

0.496242 

2 

1.00 

1.25 

1.50 

0,186231 

3 

1,000 

1,125 

1.250 

0.015051 

4 

1.0000 

1.0625 

L1250 

-0.071827 

5 

1.06250 

1.09375 

1.12500 

-0,028362 

6 

1.093750 

1.109375 

j. 125000 

-0.006643 

7 

1.1093750 

1.1171875 

1.1250000 

0.004208 

8 

1.10937500 

LI 1323125 

. ; 

1.11718750 

-0.001216 


(a.m) 



(a) If/( b) and/( c) have 
opposite signs then 
squeeze from the right. 


(«./<«)) 



(b) If /(c) and/(6) have 
opposite signs then 
squeeze from the left. 


Figure 2.8 The decision process for the false position method. 


A virtue of the bisection method is that formula (TO) provides a predetermined 
estimate for the accuracy of the computed solution. In Example 2,7 the width of the 
starting interval was i?o - an — 2. Suppose that Table 2.1 were continued to the 
thirty first iterate; then, by (10), the error bound would be [£31 < (2 — 0)/2 3i % 

4.656613 x i0" l0 . Hence c 3 ] would be an approximation to r with nine decimal places 
of accuracy. The number iV of repeated bisections needed to guarantee that the Ath 
midpoint c?* is an approximation to a zero and has an error less than the preassigned 
value 5 is 


where the points (c, 0 ) and ( 6 , f{b)) are used. 

Equating ihe siopes in (16) and (IT), we have 

fib) - f(a) _ 0 - f{b) 
b - a c - b 


which is easily solved for c to get 


(15) 


N = int 


/ In (b-a) - ln(3) 
\ ln< 2 ) 


) 


(18) 


f(h){b-a) 
/(b) - f (a)' 


The proof of this formula is left as an exercise. 

Another popular algorithm is the method of false position or the regula falsi 
method. It was developed because the bisection method converges at a fairly slow' 
speed. As before, we assume that fia) and f(b) have opposite signs. Tne bisection 
method used the midpoint of the interval la, hi as the next iterate. A better approxi¬ 
mation is obtained if we hnd the point (c, 0) where the secant line L joining the points 
(a, f[a)) and (*,/(&)) crosses the .z-axis (see Figure 2,8). To find the value c, we 
write down two versions of the slope m of the line L : 


( 16 ) 


m 


m - m 

b -a 


where the points (a, /(a)) and (t, f(b)) are used, and 


0 -/( 6 ) 
c - b 


tne three possibilities are the same as before: 

(19) If f(a) and /(c) have opposite signs, a zero lies in [a. r], 

( 20 ) If f j c) and / (b) have opposite signs, a zero lies in |c, H 

(21) If f(c) = 0, then Ihe zero is c. 

Convergence of the False Position Method 

The decision process implied by (19) and ( 20 ) along with (18) is used to construct 
a sequence of intervals {[a n , b n ]} each of which brackets the zero. At each step the 
approximation of the zero r is 


. , f(b„)(b n — a n ) 

C ' n " f{b n ) - f(a n ) ’ 


(17) 


m — 


(22) 
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Figure 2 3 The stationary endpoint /or rhe false position 
method. 


and it can be proved that the sequence {c„} will converge to r. But beware; although 
the interval width b r: - a„ is getting smaller, it is possible that it may not go to zero. If 
the graph of y - fix) is concave near (r, 0 ), one of the end points becomes fixed and 
the other one marches into the solution (see Figure 2.9). 

Now we rework the solution to x sin{x) - 1 = Q using the method of false posi¬ 
tion and observe that it converges faster than the bisection method. Also, notice that 
{ b lt — a„ 0 does not go to zero. 


Example 2,8. Vise the false position method to find the root of x sin(x) -1=0 that is 
located in the interval 10 , 2 ] (the function sin(jc) is evaluated in radians). 

Starting with ao = 0 and be ~ 2, we have /(0) = -1.00000000 and /(2) = 
0.81859485, so a root lies in the interval [0, 2), Using formula ( 22 ), we get 


0.81859485(2 — 0) 
0.81859485 — (— 1 ) 


- J.09975017 


and 


f(co) = -0.02001921. 


The function changes sign on the interval [cQ t boj = [ 1,09975017,2), so we squeeze from 
the left and set a\ = cq and b\ = ho. Formula (22) produces the next approximation: 


0.81859485(2- l .09975017) 
0.81859485 - (-0.02001921) 


M2124074 


and 


fin) =0.00983461. 

Next fix) changes sign on [uj, nl = [1.09975017, 1.1 21240741 , and the next decision is 
to squeeze from the right and set a 2 = fl] and b 2 = c i. A summary of the calculations is 
given in Table 2.2. ■ 

The termination criterion used in the bisection method is not useful for the false 
position method and may result in an infinite loop. The closeness of consecutive iter¬ 
ates and the size of \f(c„)' are both used in the termination criterion for Program 2 . 3 . 
In section 2.3 we discuss the reasons lor this choice. 
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lkblt 2J2 False Position Method Solution of x $in(x) -1=0 


k 

Left 

end point, <?* 

Midpoint, a 

Right 

endpoint, £>> 

Function value, 

1 /to) 

0 

0.00000000 

1.09975017 

2.00000000 

-0.020D192I 

1 

1.09975017' 

1.12124074 

2.00000000 

0.00983461 

2 

1.09975017 

1.11416120 

1.12124074 

0.00000563 

3 

1.09975017 

1.11415714 

1.11416120 

0.00000000 


I Program 222 (Bisection Method). To approximate a root of the equation f(x) = 0 ' 
in the interval [a, b]. Proceed with the method only if /(x) is continuous and fia) J 
| and / jb) have opposite signs. _' 

function [c,err,yc]“bisect(f,a,b,delta) 

XInput - f is the function input as a string } f } 

X - a and b are the left and right end points 

*/. - delta is the tolerance 

XCutput - c is the zero 

X - yc“f(c) 

*/, - err is the error estimate for c 

ya=feval(f,a); 
yb=fevalff,b); 
if ya*yb>0,break,end 

maxl=l+round((log(b-a)-log(delta))/log(2)); 
for k-1:maxl 
c=(a+b)/2; 
yc*feval(f,c); 
if yc^O 
a»c; 
b=c; 

elseif yb*yc>0 
b=c; 
yb»yc; 
else 
a=c; 
ya=yc; 

end 

if b-a < delta, break,end 

end 

c=(a+b)/2; 
err^abs(b-a); 
yc=feval(f,c), 
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Program 2.3 (Faise Position or Reguia Faisi Method). To approximate a root of 
the equation f(x) — 0 in the interval [ a , b]. Proceed with the method only if / (jc) 
is continuous and f(a ) and f{b) have opposite signs. 

function [c, err, yc] =regula(f, a, b, delta, epsilon ,inaxl) 

^Input - f is the function input as a string ’f* 

V t - a and b are the left and right end points 

*L - delta is the tolerance for the zero 

% - epsilon is the tolerance for the value of f at the zero 

f- - maxi is the maximum number of iterations 

^Output - c is the zero 

7. - yc=f (c) 

7, - err is the error estimate for c 

ya=feval(f,a); 
yb=feval(f,b); 
if ya*yb>0 

dispf’Note: f(a)*fCb)>0'), 
break, 

end 

for k=l :tnaxl 

dx=yb*(b~a)/(yb~ya); 

c^b-dx; 

ac*c-a; 

yc=feval(f ,c); 
if yc"0, break ; 
elseif yb*yc>0 
b-c; 
yb=yc; 
else 
a=c; 
ya=yc; 

end 

dx=min(abs(dx),ac); 
if abs(dx)<delta,break,end 
if abs(yc)<epsilon,break,end 

end 

c , 

err=abs(b-a)/2; 
yc=f eval(f,c); 
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it xeroses lor Bracketing Methods 


In Exercises 1 and 2, find an approximation for the interest rate / that will yield the total 
annuity value A if 240 monthly payments P are made. Use the two starting values for I 
and compute the next three approximations using the bisection method. 

1. P =$275, A =$250,000, I 0 = 0.11, h = 0.12 

2. P =$325, A =$400,000, 7 0 = 0.13, h = 0.14 

3. For each function, find an interval [a, b] so that f(a) and / (b) have opposite signs, 
(a) f(x) = e x -2-x 

<b) fix) = cos(r) + 1 - x 

(c) fix) = ln(*)-5+x 

(d) /(x)=x 2 - 10*+23 

In Exercises 4 through 7 start with f«o, 7>ol and use the false position method to compute 
CO, ci, C2, and C 3 . 

4. e x — 2 — x = 0, [a 0 , *ol = [-2.4, -1.6] 

5. cos(r) + 1 ~x = 0, [ao, 6 o] = [0.8, 1.6] 

6 . in(x) — 5 + x = 0, [ao, &o] = [3.2,4.0] 

7. x 2 - lOx + 23 = 0, [n 0 , M = [6,0, 6.8] 

8 . Denote the intervals that arise in the bisection method by [ao>ol, [ai,£>i], 

[a n ,b n ]. 

(a) Show that a 0 5 a\ 5 ■ • < < - and that - - ■ < b n < - • ■ < b\ < bo- 

(h) Show that b„ - a n = (bo - a 0 )/2 n . 

(c) Let the midpoint of each interval be c„ = (a n + b„)/2. Show that 


< b n < ■ • ■ < b\ < b 0 . 


lim a n = lim c„ = lim b n . 

fl—*OQ «-HXP »-*■ OQ 

Hint. Review convergence of monotone sequences in your calculus book. 

9. What will happen if the bisection method is used with the function /(x) = l/(x - 2) 
and 

(a) the interval is [3, 7]? (b) the interval is [1,7]? 

10. What will happen if the bisection method is used with the function f(x) = tan(x) 
and 


w musrvai is [j, -+j t (uj tne interval is 11, 3 j 7 

11. Suppose that the bisection method is used to find a zero of /(x) in the interval [2, 7]. 
How many times must this interval be bisected to guarantee that the approximation 
C}t has an accuracy of 5 x 10 -9 ? 

12. Show that formula (22) for the false position method is algebraically equivalent to 

c _ a nf (bn ) - b n f (a „) 

f(bn)-f(a n ) ■ 
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13. Establish formula (15) for determining the number of iterations required in the bisec¬ 
tion method. Hint. Use \b — a|/2" +1 < 6 and take logarithms. 

14. The polynomial f(x) = (x — 1) 3 <jc —2) Or-3) has three zeros: * = 1 of multiplicity 3 
and x = 2 and x = 3, each of multiplicity 1. If oo and ho are any two real numbers 
such that an < 1 and ho > 3, then /(ao)f(bo) < 0 . Thus, on the interval fao. bg] 
the bisection method will converge to one of the three zeros. If ao < 1 and ho > 3 
are selected such that c n = *"+ bn is not equal to 1, 2, or 3 for any n > 1, then the 
bisection method will never converge to which zero(s)? Why? 

15. If a polynomial, f(x), has an odd number of real zeros in the interval [ao, hn], and 
each of the zeros is of odd multiplicity, then /(no)/(ho) < 0, and the bisection 
method will converge to one of the zeros. If ao < 1 and ho > 3 are selected such that 
c„ = — ^ is not equal to any of the zeros of fix) for any n > 1, then the bisection 
method will never converge to which zero(s)? Why? 


Algorithms and Programs 


1. Find an approximation (accurate to 10 decimal places) for the interest rate / that will 
yield a total annuity value of $500, 000 if 240 monthly payments of $300 are made. 

2. Consider a spherical ball of radius r = 15 cm that is constructed from a variety 
of white oak that has a density of p — 0.710. How much of the ball (accurate to 
8 decimal places) will be submerged when it is placed in water? 

3. Modify Programs 2.2 and 2,3 to output a matrix analogous to Tables 2.1 and 2,2, 
respectively (i.e., the first row of the matrix would be [0 ao co ho /(cq)]). 

4. Use your programs from Problem 3 to approximate the three smallest positive roots 
of x = tan(x) (accurate to 8 decimal places). 

5. A unit sphere is cut into two segments by a plane. One segment has three times the 
volume of the other. Determine the distance x of the plane from the center of The 
sphere (accurate to 10 decimal places). 


.3 Initial Approximation and Convergence Criteria 

The bracketing methods depend on finding an interval [a, b ] so that f{a) and /(h) have 
opposite signs. Once the interval has been found, no matter how large, the iterations 
will proceed until a root is found. Hence these methods are called globally convergent. 
However, if fix} = 0 has several roots in [a, b], then a different starting interval must 
be used to find each root. It is not easy to locate these smaller intervals on which /(a ) 
changes sign. 

In Section 2.4 we develop the Newton-Raphson method and the secant method for 
solving f(x) =0. Both of these methods require that a close approximation to the root 


Sec. 2.3 Initial, approximation and Convergence Criteria 

be given to guarantee convergence. Hence these methods are called locally convergent. 
They usually converge more rapidly than do global ones. Some hybrid algorithms start 
with a globally convergent method and switph to a locally convergent method when 
the iteration gets close to a root. 

If the computation of roots is one part of a larger project, then a leisurely pace 
is suggested and the first thing to do is graph the function. We can view the graph 
y = fix) and make decisions based on what it looks like {concavity, slope, oscillatory 
behavior, local extrema, inflection points, etc.). But more important, if the coordinates 
of points on the graph are available, they can be analyzed and the approximate location 
of roots determined. These approximations can then be used as starting values in our 
root-finding algorithms. 

We must proceed carefully. Computer software packages use graphics software of 
varying sophistication. Suppose that a computer is used to graph y = fix) on [a, b]. 
Typically, the interval is partitioned into N + 1 equally spaced points: a — xo < 
x\ < ■ ■ ■ < xm = b and the function values yu — f (x^) computed. Then either a 
line segment or a “fitted curve” are plotted between consecutive points (x*_i, yi—i) 

and (x*, yk) for k = 1,2. N. There must be enough points so that we do not 

miss a root in a portion of the curve where the function is changing rapidly. If fix) 
is continuous and two adjacent points (xfc-i f yjt-i) and U; : , >■*) he on opposite sides 
of the x-axis, then the Intermediate Value Theorem implies that at least one root lies 
in the interval [x*-i, x*]. But if there is a root, or even several closely spaced roots, 
in the interval [x*_i, x*] and the two adjacent points (xjt-i, yt-i) and (jc*, yjt) lie on 
the same side of the x-axis, then the computer-generated graph would not indicate a 
situation where the Intermediate Value Theorem is applicable. The graph produced by 
the computer will not be a true representation of the actual graph of the function /. 
It is not unusual for functions to have “closely” spaced roots; that is, roots where the 
graph touches but does not cross the x-axis, or roots “close” to a vertical asymptote. 
Such characteristics of a function need to be considered when applying any numerical 
root-finding algorithm. 

Finally, near two closely spaced roots or near a double root, the computer-generated 
curve between (x*-i, yjt-i) and (x*. y*) may fail to cross or touch the x-axis. If 
|/(xjt)| is smaller than a preassigned value € (i.e.,/(x*) ^ 0), then x* is a tentative 
approximate root. But the graph may be close to zero over a wide range of values near 
xjt, and thus Xk may not be close to an actual root. Hence we add the requirement that 
the slope change sign near (x*, y*); that is, m t _, = and mu = must 

have opposite signs. Since x * xk- \ > 0 and xt+i — x* >0, it is not necessary to use 
the difference quotients, and it will suffice to check to see if the differences y* — yk- 1 
and yt+i - y* change sign. In this case, xt is the approximate root. Unfortunately, 
we cannot guarantee that this starting value will produce a convergent sequence. If the 
graph of y = /(x) has a local minimum (or maximum) that is extremely close to zero, 
then it is possible that Xk will be reported as an approximate root when /(x*) ^ 0, 
although xjt may not be close to a root. 
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Table 2.3 Finding Approximate Locations for Roots 




Example 2.9. Find the approximate location of the roots of — x 2 — x 4- 1 = 0 on the 

interval [-1.2, 1.2]. For illustration, choose N = 8 and look at Table 2.3. 

The three abscissas for consideration are — 1.05, —0.3, and 0.9. Because f{x ) changes 
sign on the interval [—1.2,-0.9], the value -1.05 is an approximate root; indeed, 
/C— 1 -05) = —0.210. 

Although the slope changes sign near -0.3, we find that /(—0,3) = 1.183; hence 
—0.3 is not near a root. Finally, the slope changes sign near 0.9 and/(0.9) = 0.019, so 0.9 
is an approximate root (see Figure 2.10) ■ 
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y 



Figure 2.11 (a) The horizontal convergence band for locating a solution to 

fix) = 0. 


y 



Figure 2.11 (b) The vertical convergence band for locating a solution to f(x) = 0, 


Checking for Convergence 

A graph can be used to see the approximate location of a root, but an algorithm must be 
used to compute a value p n that is an acceptable computer solution. Iteration is often 
used to produce a sequence {/?*} that converges to a root /?, and a termination criterion 
or strategy must be designed ahead of time so that the computer wiVi stop when an 
accurate approximation is reached. Since the goal is to solve /(x) = 0, the final value 
p n should have the property that \f(p n )\ < e. 

The user can supply a tolerance value e for the size of |/(p, r )| and then an iterative 
process produces points P* = (/?*, f{pk)) until the last point P n lies in the horizontal 
band bounded by the lines y = -t-e and y = — e, as shown in Figure 2.11(a). This 
criterion is useful if the user is trying to solve h(x) = L by applying a root-finding 
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algorithm to the function f (x) ~ h(x) — L. 

Another termination criterion involves the abscissas, and we can try to ( = ^ 

the sequence {/?*-} is converging. If we draw the vertical lines x = p +6 ant: 
on each side of x = p, we could decide to stop the iteration when the p ' 
between these two vertical lines, as shown in Figure 2.11(b). 

The latter criterion is often desired, but it is difficult to implement because it ■ 
volves the unknown solution p. We adapt this idea and terminate further calculatic 
when the consecutive iterates p„ \ and p n are sufficiently close or if they agree with 
M significant digits. 

Sometimes the user of an algorithm will be satisfied if p n a# p n _ { and other tirr 
when f(p n ) ^ 0. Correct logical reasoning is required to understand the corif 
quences. If we require that )p n - p] < 5 and \ f{p»)\ < e, the point P„ will 
located in the rectangular region about the solution ( p , 0), as shown in Figure 2,12( 

If we stipulate that \p n ~ p\ < 5 or \f{p n )\ < e, the point P n could be local 
anywhere in the region formed by the union of the horizontal and vertical stripes, 
shown in Figure 2.12(b), The size of the tolerances & and e are crucial. If the tol¬ 
erances are chosen too small, iteration may continue forever. They should be chosen 
about 100 times larger than 10~ w , where M is the number of decimal digits in the 
computer’s floating-point numbers. The closeness of the abscissas is checked with one 
of the criteria 

| p n - p n -11 < S (estimate for the absolute error) 
or 

—^ 7 — r" l \ < <5 (estimate for the relative error). 

\Fn\ + iPrt-ll 

The closeness of the ordinate is usually checked by \ f (p*) \ <€. 

Troublesome Functions 

A computer solution to f(x) = 0 will almost always be in error due - - 

and/or instability in the calculations. If the graph y = f(x) is steep n . f 
(p, 0), then the root-finding problem is well conditioned (i.e., a solution with scwvm 
significant digits is easy to obtain). If the graph y = f(x) is shallow near ( p , 0), then 
the root-finding problem is ill conditioned (i.e., the computed root may hav ’ a fen 
significant digits). This occurs when f(x) has a multiple root at p. This -'usseri 
further in the next section. 


y 



Figure 2.12 (a) The rectangular region defined by \x - p\ < & AND |y | < e. 


y 



Figure 2.12 (b) The unbounded region defined by \x — p\ < & OR |y| < e. 
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Program 2.4 (Approximate Location of Roots). To roughly estimate the loca¬ 
tions of the roots of the equation f(x ) = 0 over the interval [a,b\, by using the 
equally spaced sample points (x k , f(x k )) and die following criteria: 

(i) (y*~i)(y*) < 0, or 

(ii) !y*l < € and (y* - y*_])(y*+i - y*) < 0. 

That is, either /(**_ 0 and f{x k ) have opposite signs or !/(**) I is small and the 
slope of the curve y = f(x) changes sign near (x kt f (x k )). 

function R * approot (X,epsilon) 

7, Input - f is the object function saved as an M-file named f ,m 
7, - X is the vector of abscissas 

7, - epsilon is the tolerance 

7. Output - R is the vector of approximate roots 

Y=f(X); 

yrange = max (Y) -min (Y); 
epsiion2 = yrange*epsiion; 
n=length(X); 
m=0; 

X(n+l)=X(n); 

Y(n+l)=Y(n); 

for k=2:n, 

if Y(k-l)*Y(k)<=0, 
m=m+l; 

R(m)=(X(k-l)+X(k))/2; 

end 

s«(Y(k)-Y(k-l))*(Y(k+l)-Y(k)): 
if (abs(Y(k)) < epsilon2) ft (s<=0), 
m=m+l; 

R(m)=X(k); 

end 

end 

Example 2,10. Use approot to find approximate locations for the roots of fix) = 
sin(co$(x 3 )) in the interval [-2, 2], First save / as an M-file named f.m. Since the results 
will be used as initial approximations for a root-finding algorithm, we will construct X so 
that the approximations will be accurate to 4 decimal places. 

»X=-2: .001:2; 

»approot (X,0.00001) 
ans= 

-1.9875 -1.6765 -1.1625 1.1625 1.6765 1.9875 


Comparing the results with the graph of /, we now have good initial approximations for 
one of our root-finding algorithms. ■ 


Exercises for Initial Approximation 


In Exercises 1 through 6 use a computer or graphics calculator to graphically determine 
the approximate location of the roots of fix) = 0 in the given interval. In each case, 
determine an interval [a, b\ over which Programs 2.2 and 2,3 could be used to determine 
the roots (i.e., f(ct)f(b ) <; 0). 

X* f(x) =s Jt 2 -— e* for — 2 < x < 2 

2. f{x) = x — cos(x) for —2 < x < 2 

3. /( Jr) = sin(jr) — 2cos(r) for — 2 < x < 2 

4. f(x) = cos(*) + (I +x 2 r l for -2 < x < 2 

5. fix ) = (x - 2) 2 - ln(x) for 0.5 < x < 4.5 

6. fix) = 2x - tan(x) for -1.4 < j < 1.4 


Algorithms and Programs 


In Problems 1 and 2 use a computer or graphics calculator and Program 2.4 to approximate 
the real roots, to 4 decimal places, of each function over the given interval. Then use 
Program 2,2 or Program 2.3 to approximate each root to 12 decimal places. 

1. f(x) = l,000,000x 3 - 111,000 j 2 + IUOjc- 1 for —2 < x < 2 

2. fix) = 5jc 10 - 38x 9 4- 21 x b - 5ttjc 6 — 3jtx 5 - 5x 2 + 8x - 3 for -15 < x <15. 

3. A computer program that plots the graph of y = fix) over the interval \a. b] using 
the points (xo, yo). (xi, y\), .... and (jr^, yiv) usually scales the vertical height of 
the graph, and a procedure must be written to determine the minimum and maximum 
values of / over fire interval. 

(a) Construct an algorithm that will find the values y ma x = max^fy*} and K m j n = 

nun*{yjt}. 

(b) Write a MATLAB program that will find the approximate location and value of 
the extreme values of /(x) on the interval (a, b\, 

(c) Use your program from part (b) to find the approximate location and value of 
the extreme values of the functions in Problems I and 2. Compare your approx¬ 
imations with the actual values. 




70 Chap. 2 The Solution of Noneinear Equations fix) = 0 

2.4 Newton-Raphson and Secant Methods 


Slope Methods for Finding Roots 

If /u>, fix), and /" (x) are continuous near a root p, then this extra information 
regarding the nature of fix ) can be used to develop algorithms that will produce se¬ 
quences } that converge faster to p than either the bisection or false position method. 
The Newton-Raphson (or simply Newton’s) method is one of the most useful and best 
known algorithms that rebes on the continuity of fix') and /"{*)■ We shall introduce 
it graphically and then give a more rigorous treatment based on the Taylor polynomial. 

Assume that the initial approximation pc, is near the root p. Then the graph of 
v = /(jf) intersects the jc-axis at the point ( p, 0)> and the point (po* f{po)) lies on the 
curve near the point ip, 0) (see Figure 2. ] 3). Define p\ to be the point of intersection of 
the .x-axis and the line tangent to the curve at the point (po, fipo)). Then Figure 2.13 
shows that p\ will be closer to p than p^ in this case. An equation relating p\ and pa 
can be found if we write down two versions for the slope of the tangent line L: 

0-fipo) 

( 1 ) m = -. 

pi - P0 


( 2 ) 


m = f(po)> 


which is the slope at the point (po, fipo)). Equating the values of the slope m in 
equations <1) and (2) and solving for p j results in 


(3) 


P i =P0~ 


fipo) 
f ! (Po) 



Figure 2.13 The geometric construction of p\ and p 2 for 
the Newton-Raphson method. 
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The process above can be repeated tn obtain a sequence { p*} that converges to p. 
We now make these ideas more precise. 

Theorem 2.5 (Newton-Raphson Theorem). Assume that / e C 2 \a. b\ and there 
exists a number p € \a, b\, where fip) - 0. If fip) £ 0 t then there exists a <5 > 0 
such that the sequence (/?fcf£L 0 defined by the iteration 


Pk = 8(Pk-i) = Pk-1 ~ 


for k = 1,2, 


Remark. The function £<*) defined by formula 


%(■*) — x 


is called the Newton-Raphson iteration function. Since / (p) = 0, it is easy to see 
that g(p) = p. Thus the Newton-Raphson iteration lor finding the root of the equation 
fix) = 0 is accomplished by finding a fixed point of the function g(\). 

Proof. The geometric construction of p\ shown in Figure 2.13 does not help in un¬ 
derstanding why pq needs to be close to p or why the continuity of f"{x) is essential. 
Our analysis starts with the Taylor polynomial of degree n = 1 and its remainder teim: 


f(x) - f(p 0 i + f (/?o)U - po) +■ 


/"(c) (or - po) 2 


where c lies somewhere between pa and x. Substituting x = p into equation (6) and 
using the fact that fin) =0 produces 


( 7 ) 0 = fipa) +fipo)(p-po) + - ---. 

If pa is close enough to p, the last term on the right side of (7) will be small com¬ 
pared to the sum of the first two terms. Hence it can be neglected and wc can use the 
approximation 

(H) 0 as / (po) + f'(pQ)(P ~ R 0 ). 

Sol sing for p in equation (8), we get p ^ pa- fipaVfipa)- This is used to define 
the r.ext, approximation p\ to the root 


P\ = P0 


When pk ; is used in place of pc in equation (9), rhe general rule (4) is established. For 
most applications this is all that needs to be understood. However, to fully comprehend- 
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what is happening, we need to consider the fixed-point iteration function and apply 
Theorem 2.2 in our situation. The key is in the analysis of g'(x): 

, /'w'co - /wry) .. /(*)/"(*) 

* w ( ru )) 2 ( fu -)) 2 m 


By hypothesis, f(p ) = 0; thus g'(p) — 0. Since g'(p) = 0 and g(x) is continuous, it 
is possible to find a 8 > 0 so that the hypothesis |g'(x)j < 1 of Theorem 2.2 is satisfied 
on (p — 8, p + 8). Therefore, a sufficient condition for po to initialize a convergent 


sequence {pjfc}£i 0 , which coi 
and that 8 be chosen so that 


00 ) 


\mf(x)\ 


for all x € (p — 5, p A £) 


Corollary 2.2 (Newton’s Iteration for Finding Square Roots). Assume that A > 0 
is a real number and let pn > 0 be an initial approximation to \fk. Define the sequence 
{pjt}^| using the recursive rule 

A 

Pk-\ H- 

(11) pt= - 2 Pk ~ ] for k = l, 2, .... 

Then the sequence converges to a/A; that is, lim^M p * = *J~A. 

Outline of Proof. Start with the function f(x) = x 2 - A, and notice that the roots of 
the equation x 2 - A = 0 are ±Va. Now use f(x) and the derivative f'(x) in formula 
(5) and write down the Newton-Raphson iteration formula 


02) 


gO) = x- 


f(x) 

/'(*) 



This formula can be simplified to obtain 


03) 


g(x) = 



When g(x) in (13) is used to define the recursive iteration in (4), the result is formula 
(11). It can be proved that the sequence that is generated in (11) will converge for any 
starting value p$ > 0. The details arc left for the exercises. « 


An important point of Corollary 2.2 is the fact that the iteration function g(xi 
involved only the arithmetic operations +, —, x, and /, Jf g(x) had involved the cal¬ 
culation of a square root, we would be caught in the circular reasoning that being able 
to calculate the square root would permit you to recursively define a sequence that will 
converge to <Za. For this reason, f(x) — x 2 — A was chosen, because it involved onh 
the arithmetic operations. 



Now let us turn to a familiar problem from elementary physics and see why de¬ 
termining the location of a root is an important task. Suppose that a projectile is fired 
from the origin with an angle of elevation bo and initial velocity up. In elementary 
courses, air resistance is neglected and we learn that the height y — y{t) and the dis¬ 
tance traveled x —■ x(/X measured in feet, obey the rules 

(14) y = Vy t — 16 r 2 and x = v x i , 

where the horizontal and vertical components of the initial velocity are u* ^ up cos(hp) 
and v y = vq sinffiy), respectively. The mathematical model expressed by the rules 
in (14) is easy to work with, but tends to give too high an altitude and too long a range 
for the projectile’s path. If we make the additional assumption that the air resistance is 
proportional to the velocity, the equations of motion become 

(15) y = /«) = (Co, + 32 C 1 ) (l - <T ,/C ) - 320 
and 

(16) *=r(0==Cu*(l-e~' /C ), 


where C — m/k and k is the coefficient of air resistance and m is the mass of the 
projectile. A larger value of C will result in a higher maximum altitude and a longer 
range for the projectile. The graph of a flight path of a projectile when air resistance is 
considered is shown in Figure 2.14. This improved model is more realistic, but requires 
the use of a root-finding algorithm for solving /(/) = 0 to determine the elapsed time 
until the projectile hits the ground. The elementary model in (14) does not require a 
sophisticated procedure to find the elapsed time. 
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Table 2.4 Finding die Time When die Height /(0 Is 


Figure 2.14 Path of a projectile 
with air resistance considered. 


Zero 


k 

1 

Time, pf. 

Pk+t ~ Pk 

Height, f{p k ) 

0 

8.00000000 

0.79773101 

83.22097200 

1 

8.79773101 

—Q.Q553Q16Q 

-6.68369700 

2 

8.74242941 

-0.00025475 

-0.03050700 

3 1 

8.74217467 

-0.00000001 

-0,00000100 

4 

8,7421746 6 

1 

0.00000000 

1 ___ 

0.00000000 


Example 2.12. A. projectile is fired with an angle of elevation 6<i — 45°, v y = v x = 
160 ft/sec, and C = 10. Find the elapsed time until impact and find the range. 

Using formulas (15) and (16), the equations of motion are y = /(/) = 4800(1 - 
- 320* and * = r(/) = 1600(1 - e-^ 10 ). Since /(8) = 83.220972 and f(9) = 
-31.534367, we will use the initial guess po — 8. The derivative is = 480c~ f/m - 
320, and its value /'(po) = /'(&) = -104.3220972 is used in formula (4) to get 


83.22097200 

-104.3220972 


= 8.797731010. 


A summary of the calculation is given in Table 2.4. 

The value p<t has eight decimal places of accuracy, and the time until impact is / c s 
8.74217466 seconds. The range can now be computed using r(r), and we get 

r(8.74217466) = 1600 (l - ^ 932.4986302fL ■ 


The Division-by-Zero Error 

One obvious pitfall of the Newton-Raphson method is the possibility of division by 
zero in formula (4), which would occur if f f (Pk~ l) = 0. Program 2.5 has a procedure 


to check for this situation, but what use is the last calculated approximation p*-i in 
this case? It is quite possible that / is sufficiently close to zero and that pjt_] 
is an acceptable approximation to the root. We now investigate this situation and will 
uncover an interesting fact, that is, how fast the iteration converges. 

Definition 2.4 (Order of a Root). Assume that f{x ) and its derivatives f'(x). 

are defined and continuous on an interval about x = p. We say that 
fix) = 0 has a root of order M at x = p if and only if 

(17) 

/<P) = o, f'(p) = 0 . /<"-‘V) = 0, and #0. 

A root of order M — 1 is often called a simple root , and if M > I, it is called a 
multiple root. A root of order M = 2 is sometimes called a double root , and so on. 
The next result will illuminate these concepts. a 

Lemma 2.1. If the equation f{x) = 0 has a root of order Af at x — p, then there 
exists a continuous function h(x) so that f(x) can be expressed as the product 

(18) fix) = (x- p) M h(x), where h(p) ^ 0. 

Example 2.13. The function fix ) = jc 3 — 3x -f 2 has a simple root at p = — 2 and a 
double root at p — 1. This can be verified by considering the derivatives /'(*) = 3 a 2 — 3 
and f"(x) — 6x. At the value p = —2, we have /(—2) = 0 and /'(—2) = 9, so 
Af = 1 in Definition 2.4; hence p = - 2 is a simple root. For the value p — 1, we have 
/(l) = 0, /'(l) = 0, and /"(l) = 6, so M — 2 in Definition 2.4; hence p = 1 is a double 
root. Also, notice that fix) has the factorization f(x) — (a 4- 2){x — l) 2 . ■ 


Speed of Convergence 

The distinguishing property we seek is the following. If p is a simple root of f(x) = 0, 
Newton’s method will converge rapidly, and the number of accurate decimal places 
(roughly) doubles with each iteration. On the other hand, if p is a multiple root, the 
error in each successive approximation is a fraction of the previous error. To make 
this precise, we define the order of convergence. This is a measure of how rapidly a 
sequence converges. 


Definition 2.5 (Order of Convergence). Assume that converges to p and 

set E n = p - p n for n > 0. If two positive constants A ^ 0 and R > 0 exist, and 


(19) 


lim 


I P ~ Pn+ il 
I P ~ Pn\ R 


lim 

H—»■ 30 


\En+l\ 

\E n \ R 


= 4, 
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Table 2.5 Newton’s Method Converges Quadratically at a Simple Root 


k 

Pk 

i 

Pk +1 - Pk 

E k = P~Pk 

|£fc+i 1 

\Ek\ 2 

0 

—2,400000000 

0,323809524 

0.400000000 

0.476190475 

1 

-2.076190476 

0.072594465 

0.076190476 

0.619469086 

2 

—2.003596011 

0.003587422 

0.003596011 

0.664202613 

3 

-2.000008589 

0.000008589 

0.000008589 


4 

-2.000000000 

0.000000000 

0.000000000 



then the sequence is said to converge to p with order of convergence R. The num¬ 
ber A is called the asymptotic error constant. The cases R = 1,2 are given special 
consideration. 

(20) If R = 1, the convergence of \p n J^ 0 is called linear. 

(21) If R = 2, the convergence of {pni^L o is called quadratic. k 

If R is large, the sequence {/^(converges rapidly to p ; that is, relation (19) implies 
that for large values of n we have the approximation | E„+\ | A \ E n \ R . For example, 
suppose that R = 2 and \E n | ^ 10 -2 ; then we would expect that |£„+j | ^ A x 10~ 4 . 

Some sequences converge at a rate that is not an integer, and we will see that the 
order of convergence of the secant method is J? = (1 + \/5)/2 & 1.618033989. 

Example 2.14 (Quadratic Convergence at a Simple Root). Start with po = ~ 2.4 
and use Newton-Raphson iteration to find the root p — —2 of the polynomial f{x ) = 
x 3 3x + 2. The iteration formula for computing {pk) is 

2pi , - 2 

(22) «=«(«-l) = rr J —r- 

3 «-1 - 3 

U sing formula (21) to check for quadratic convergence, we get the values in Table 2.5. ■ 

A detailed look at the rate of convergence in Example 2.14 will reveal that the error 
in each successive iteration is proportional to the square of the error in the previous 
iteration. That is, 

Ip - Pk +il ^ A\p - Pk \\ 
kvhere A % 2/3. To check this, we use 

\p -pi | = 0.000008589 and \p - p 2 1 2 = |0.0035960111 2 = 0.000012931 

and it is easy to see that 

\p- p 3 \ = 0.000008589 ^ 0.000008621 = ^\p - p 2 \ 2 . 


Sec. 2.4 Newton-Raphson and Secant Methods 


Tbble 2.6 Newton’s Method Converges Linearly at a Double Root 


k 

| Pk 

— 

Pk+[ ~ Pk 

| E k = P ~ Pk 

| 

1 \£ k l 

0 

1.200000000 

-0.096969697 

-0.200000000 

0.515151515 

1 

1.103030303 

-0.050673883 

-0.103030303 

0.508165253 

2 

1.052356420 

0.025955609 

-0.052356420 

0,496751115 

3 

1.026400811 

—0,Oi3143081 

-0.026400811 

0.509753688 

4 

1.013257730 

-0.006614311 

-0.013257730 

0.501097775 

5 

1.006643419 

1 :_i 

-0,003318055 

-0.006643419 

0.500550093 


Example 2.15 (Linear Convergence at a Double Root). Start with po = 1.2 and use 

Newton-Raphson iteration to find the double root p ~ 1 of the polynomial fix) ~ .v 3 - 

3t + 2. 

Using formula (20) to check for linear convergence, we get the values in Table 2.6. ■ 

Notice that the Newton-Raphson method is converging to the double root, but at 
a slow rate. The values of /(pit) in Example 2.15 go to zero faster than the values 
of /'(p,t), SO the quotient f(pk)/f(p k ) in formula (4) is defined when p k ^ p. 
The sequence is converging linearly, and the error is decreasing by a factor of approx¬ 
imately 1 /2 with each successive iteration. The following theorem summarizes the 
performance of Newton ’s method on simple and double roots. 

Theorem 2.6 (Convergence Rate for Newton-Raphson Iteration). Assume that 
Newton-Raphson iteration produces a sequence {p n that converges to the root p 
of the function fix). If p is a simple root, convergence is quadratic and 

(23) |£„ +J | % IE ”I 2 for n suftej]fi y ] Mge. 

If p is a multiple root of order M , convergence is linear and 

(24) J£„ + ]J «= ~—rr— I E n | for n sufficiently large. 

M 

Pitfalls 

The division-by-zero error was easy to anticipate, but there are other difficulties that 
are not so easy to spot. Suppose that the function is /(x) — x 2 - 4x + 5; then the 
sequence {pit} of real numbers generated by formula (4) will wander back and forth 
from left to right and not converge. A simple analysis of the situation reveals that 
f(x) > 0 and has no real roots. 



78 Char 2 The Solution of Nonlinear equations f(x) = 0 


Sec. 2.4 Newton-Raphson and Secant Methods 


79 


y 



Figure 2,15 (a) Newton-Raphson iteration for f(x) ~ 
xe~ x can produce a divergent sequence. 


Sometimes the initial approximation p G is too far away from the desired root and 
the sequence {/?;.} converges to some other root. This usually happens when the slope 
f(po) is small and the tangent line to the curve y = f(x) is nearly horizontal. For 
example, if f(x) — cos(x) and we seek the root p = jt/ 2 and start with p 0 = 3, 
calculation reveals that p x = -4.01525255, p 2 = -4.85265757, ..and [p k ] will 
converge to a different root —3 jt/2 ^ —4,71238898. 

Suppose that f(.x) is positive and monotone decreasing on the unbounded interval 
[a, oo) and po > a\ then the sequence {/?*} might diverge to +oo. For example, if 
fix) — xe~ x and po — 2.0, then 

Pi =4.0, p2 — 5.333333333, ..., p ]5 = 19.723549434, 

and [p k ] diverges slowly to +oo (see Figure 2.15(a)). This particular function has 
another surprising problem. The value oi / [x) goes to zero rapidly as * gets large, for 
example, f{p\s) — 0.0000000536, and it is possible that pis could be mistaken for 
a root. For this reason we designed stopping criterion in Program 2.5 to involve the 
relative error 2\p k +\ - P*|/(|/^+10~ 6 ), and when k = 15, this value is 0.106817, so 
the tolerance S = 1 0~ 6 will help guard against reporting a false root. 

Another phenomenon, cycling , occurs when the terms in the sequence {n k 1 tend to 
repeat or almost repeat. For example, if f(x) - x 3 -x -3 and the initial approximation 
is po = 0, then the sequence is 

Pi = -3.000000, P2 = -1.961538, p 3 = -1.147176, p 4 = -0.006579, 
ps = —3,000389, /?6 — —1.961818, p 7 = -1.147430, 

and we are stuck in a cycle where pk +4 ^ Pk for it — 0, 1, ... (see Figure 2.15(b)). 
But if the starting value p 0 is sufficiently close to the root p k 1.671699881, then {p*} 



Figure 2,15 (b) Newton-Raphson iteration for f(x ) = 
x 3 — x — 3 can produce a cyclic sequence. 



Figure 2.15 (c) Newton-Raphson iteration for fix) — 
arc tan (x) can produce a divergent oscillating sequence. 


converges. If po = 2, the sequence converges: pi = 1.72127272, p 2 — 1.67369173. 
p 3 = 1.671702570, and p 4 - 1,671699881. 

When |g'(x)| > 1 on an interval containing the root p, there is a chance of di¬ 
vergent oscillation. For example, let f{x ) — arctan(x); then the Newton-Raphson 
iteration function is g(jr) = x — (1 4- x 2 ) arctan(jr), and g'(x) = —2x arctan(x). If the 
starting value po = 1.45 is chosen, then 

pi = -1.550263297, p 2 = 1.845931751, p 3 = -2.889109054, 
etc, (see Figure 2.15(c)). But if the starting value is sufficiently close to the root p = 0, 











Figure 2.16 The geometric construction of p 2 for the se¬ 
cant method. 


a convergent sequence results. If po = 0.5, then 

pi = —0.079559511. p 2 = 0.000335302, p 3 = 0.000000000. 

The situations above point to the fact that we must be honest in reporting an answer. 
Sometimes the sequence does not converge. It is not always the case that after N 
iterations a solution is found. The user of a root-finding algorithm needs to be warned 
of the situation when a root is not found. If there is other information concerning 
the context of the problem, then it is less likely that an erroneous root will be found. 
Sometimes f(x) has a definite interval in which a root is meaningful. If knowledge 
of the behavior of the function or an “accurate 11 graph is available, then it is easier to 
choose po- 

The Secant Method 

The Newton-Raphson algorithm requires the evaluation of two functions per iteration, 
f{Pk- i ) and f'(pk-]). Traditionally, the calculation of derivatives of elementary func¬ 
tions could involve considerable effort. But, with modem computer algebra software 
packages, this has become less of an issue. Still many functions have nonelementary 
forms (integrals, sums, etc.), and it is desirable to have a method that converges almost 
as fast as Newton’s method yet involves only evaluations of f(x ) and not of f'(x). 
The secant method will require only one evaluation of f(x ) per step and at a simple 
root has an order of convergence R 1.618033989. It is almost as fast as Newton’s 
method, which has order 2. 

The formula involved in the secant method is the same one that was used in the 
regula falsi method, except that the logical decisions regarding how to define each 
succeeding term are different. Two initial points (po, / (po)) and (pi, /(pi)) near 
the point (p, 0) are needed, as shown in Figure 2.16. Define p 2 to be the abscissa 
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Table 2.7 Convergence of the Secant Method at a Simple Root 


Pk 

Pk.+ \ ~ Pk 

E k = p- pk 

[£,,1.618 

2.600000000 

0.200000000 

0.600000000 

0.914152831 

-2.400000000 

0.293401015 

0.400000000 

0,469497765 

-2,106598985 

0.083957573 

0.106598985 

0.847290012 

-2.022641412 

0.023130314 

0.022641412 

0.693608S22 

-2.001511098 

0.001488561 

0.001511098 

0.825841116 

-2,000022537 

0.000022515 

0.000022537 

0.727100987 

-2.000000022 

0.000000022 

0.000000022 I 


-2,000000000 

0.000000000 

0.000000000 



of the point of intersection of the line through these two points and the x-axis; then 
Figure 2.16 shows that p 2 will be closer to p than to either po or p\. The equation 
relating p2, p\, and po is found by considering the slope 

(25) m =- and m =-. 

Pi - Po P2 - Pi 

The values of m in (25) are the slope of the secant line through the first two approxi¬ 
mations and the slope of the line through (pi, /(pi)) and (p 2 , 0), respectively. Set the 
right-hand sides equal in (25) and solve for p 2 = g(pu po) and get 

, /(piKpi-po) 

( 26 ) P2 - g(pu Po) = Pi - -77—-77-7- 

f(P 1 ) - /(po) 

The general term is given by the two-point iteration formula 

, fipkKPk ~ Pk-l) 


Pk +1 = g(Pk, Pk-i) — Pk ~ 


f(Pk) -f(pt-i) 


Example 2-16 (Secant Method at a Simple Root). Start with po = —2.6 and 
p\ = -2.4 and use the secant method to find the root p = — 2 of the polynomial function 
fOc) — x 3 — 3x + 2. 

In this case the iteration formula (27) is 

, (Pk- 2 Pk + 2)(p* ” pjt-l) 

(2S) Pk+i = g( P k, p k -0 = Pk -1-j--——-- 

Pk - Pk-i ~ 

Tht» can be algebraically manipulated to obtain 

rtn* / \ P^Pk-l + pkp\^\ 2 

(29) P*+i = g(p*, p k -i ) = -57 - —-2 -r- 

Pt + PiPi-i + Pi_i - 3 

The sequence of iterates is given in Table 2.7. m 
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There is a relationship between the secant method and Newton’s method. For a 
polynomial function f(x ), the secant method twcupoint formula pk+i = g(Pk> Pk-i) 
will reduce to Newton’s one-point formula p k +i — g(pk) if Pk is replaced by pk-i- 
Indeed, if we replace pk by p k -\ in (29), then the right side becomes the same as the 
right side of (22) in Example 2.14. 

Proofs about the rate of convergence of the secant method can be found in advanced 
texts on numerical analysis. Let us state that the error terms satisfy the relationship 


(30) 


\E k+ 1 \*\Et\' A1 * 


f"{p) 

2/'(p) 


0,61 B 


where the order of convergence is R = (1 + V5)/2 1.618 and the relation in (30) is 

valid only at simple roots. 

To check this, we make use of Example 2.16 and the specific values 


jp - P5 1 = 0.000022537 

|p - p 4 \ ] m - 0.00151 1098 1,618 = 0.000027296, 
and 

A = |/"(-2)/2/ / (-2)| a6IS = (2/3) a6£S = 0.778351205. 


Combine hese and it is easy to see that 

\p - p 5 | = 0.000022537 ^ 0.000021246= A\p - /> 4 | 1,618 , 


Accelerated Convergence 

We could hope that there are root-finding techniques that converge faster than linearly 
when p is a root of order M . Our final result shows that a modification can be made to 
Newton’s method so that convergence becomes quadratic at a multiple root. 


Theorem 2.7 (Acceleration of Newton-Raphson Iteration). Suppose that the 
Newton-Raphson algorithm produces a sequence that converges linearly to the root 
x = p of order M > 1. Then the Newton-Raphson iteration formula 


(31) 


Pk = Pk -i - 


Mf(pt-i) 
f’iPk- 1) 
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Table 2.8 Acceleration of Convergence at a Double Root 


k 

Pk 

P*+l - Pk 

& 

II 

1 

\Ek+t\ 

0 

1.200000000 

-0.193939394 

-0.200000000 

0.151515150 

1 

1.006060606 

-0.006054519 

-0.006060606 

0.165718578 

2 

1.000006087 

-0.000006087 

-0.000006087 


3 

1.000000000 

0.000000000 

0,000000000 



Table 23 Comparison of the Speed of Convergence 


Method 

Special 

considerations 

Relation between 
successive error terms 

Bisection 


E *+i ^ il E *i 

Reg ala falsi 


E k+ l as Aj£*j 

Secant method 

Multiple root 

E *+l *A\E k \ 

Newton-Raphson 

Multiple root 

£*+] ^ A|jfc*| 

Secant method 

Simple root 

E k+] ^A\E k \ L61 * 

Newton-Raphson 

Simple root 

E k+ 1 =* A\E k \ 2 

Accelerated 

Newton-Raphson 

Multiple root 

E k+] *A\E k \ 2 


Exampie 2.17 (Acceleration of Convergence at a Double Koot). start with po — 1.2 
and use accelerated Newton-Raphson iteration to find the double root p = 1 of fix) = 
X*-3x+2. 

Since M — 2, the acceleration formula (31} becomes 


Pk = Pk-l ~ 


2 fiPk-l) 
/'(/>*-!) 


Pk-l + 3 P*~1 ~ 4 


and we obtain the values in Table 


2 . 8 . 


table 2.9 compares the speed of convergence of the various root-finding methods 
that we have studied so far. The value of the constant A is different for each method. 


will produce a sequence {pjt}^ 0 that converges quadratically to p. 
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' Program 2.5 (Newton-Raphson Iteration). To approximate a root of f(x) = 0 j 
given one initial approximation po and using the iteration 

fipk- 1) . . ^ 

Pk - Pk-\ ~ 77 T-r for k = L 2, .... I 

f’iPk- i) j 

function [pO^err jk^y^^newtonCf ,d.f .pO.deltajepsilor^maxl) 

JJInput - f is the object function input as a string ’f* 

7 . - df is the derivative of f input as a string ’df* 

% - pO is the initial approximation to a zero of f 

7, - delta is the tolerance for pO 

7. *- epsilon is the tolerance for the function values y 

% - maxi is the maximum number of iterations 

XOutput - pO is the Newton-Raphson approximation to the zero 

7* - err is the error estimate for pO 

7* - k is the number of iterations 

7. - y is the function value f(pO) 

for i*=i:maxi 

pi=pO-feval(f,pO)/feval(df,p0); 
err*=abs(pl-pO); 
relerr=2*err/Cabs(pl)tdelta); 
pO=pl; 

y=feval(f,p0); 

if <err<delta> / Crelerr<delta J ? / (aha (y) <epsiXon) , break t end 

end 


Program 2.6 (Secant Method). To approximate a root of fix) = 0 given two 
initial approximations po and pi and using the iteration 


Pk+l = Pk- 


f(.Pi)(.Pk - Pt-i) 

Hpkt-HPk- 1 ) 


for k = l, 2, 


function [pi t err,k,y]=secant (f ,p0,pi,delta,epsilon,maxi) 
Xlnput - f is the object function input as a string 3 f* 

X - pO and pi are the initial approximations to a zero 

7* - delta is the tolerance for pi 

X - epsilon is the tolerance for the function values y 

7 t - maxi is the maximum number of iterations 

7 . 0 utput - pi is the secant method approximation to the zero 
X - err is the error estimate for pi 

X - k is the number of iterations 

X - y ia the function value f(pl) 


for k=l:maxl 


p2-pl-feval(f,pl) *(pi pO)/(feval(f,pl)-feval(f,p0)); 

err=abs (p2-pl); 

relerr=2+err/(abs(p2)+delta); 

pO*pl; 

pl*=p2; 

y=f eval(f ,pl) ; 

if (err<delta)j(relerr<delta)I(absCy)<epsilon).break*end 

end 


Exercises for Newton-Raphson and Secant Methods 


For problems involving calculations, you can use either a calculator or computer. 

1. Let /{*) = x 2 - x 4- 2. 

(a) Find the Newton-Raphson formula p* = g(pk- i). 

(b) Start with po =-1.5 and find pi, p 2 , and p 3 - 

2. Let f(x) - x 2 - x - 3. 

(a) Find the Newton-Raphson formula pk = g(pk- 1 ). 

(b) Start with po = 1.6 and find pi, p 2 , and p 3 . 

(c) Start with po = 0.0 and find pi, pi, P 3 = and p 4 . What do you conjecture about 
this sequence? 

3. Let f{x) = {x- if. 

(a) Find the Newton-Raphson formula pjt = g{pk -1 )■ 

(b) Start with po = 2.1 and find p t , p 2 , p 3 , and p 4 . 

(c) Is the sequence converging quadratically or linearly? 

4. Let fix) = x 3 - 3 j - 2. 

(a) Find the Newton-Raphson formula p* = g(pjt-i). 

(b) Start with po = 2,1 and find pi, pi, p3> and p 4 - 

(c) Is the sequence converging quadratically or linearly ? 

5. Consider the function /(r) = cos(x). 

(a) Find the Newton-Raphson formula p* = g(p*-i). 

((b) We want to find the root p = 3?r/2. Can we use po = 3? Why? 

(fc) We want to find the root p = 37 t/ 2. Can we use po = 5? Why? 

6. Coasider the function fix) = arctan(jr). 

(a) Find the Newton-Raphson formula pk = gipk- 1). 

(b) If po = 1-0, then find pj, pi, p 3 , and p 4 . What is pjt? 

(c) If po = 2-0, then find pi, p 2 , P 3 , and p 4 - What is lim n -K» Pit? 



86 Chap. 2 The Solution of nonlinear Equations /{jc) = o 


7. Consider the function fix) = xe~ x . 

(a) Find the Newton-Raphsonformula p* = g[pk-i). 

(b) If po = 0.2, then find p\> P2> P3, and p$. What is p *? 

(c) If po = 20, then find pi, p 2 , p 3 , and p 4 . What is Hm„_ ) . 00 p*? 

(d) What is the value of /(P 4 ) in part (c)? 

In Exercises 8 through 10, use the secant method and formula (27) and compute the ne\i 
two iterates p 2 and p^. 

8. Let f(x) = x 2 — 2x - 1. Start with po = 2.6 and p\ = 2.5. 

9. Let fix) = x 2 -x - 3. Start with po = 1.7 and pi = 1.67, 

10. Let f(x) — x 5 — x + 2. Start with po — —1.5 and pi = —1.52. 

11, Cube-root algorithm. Start with fix) = jv 3 - A, where A is any real number, and 
derive the recursive formula 


2pjt-i A-Ajp\_ x 


for A = 1 , 2 , .... 


12. Consider fix) = x N - A, where N is a positive integer. 

(a) What real values are the solution to fix) =0 for the various choices of N and 
A that can arise? 

(b) Derive the recursive formula 


(N - l)p *—1 + A! pi 


for k = 1, 2 , 


for finding the Nth root of A. 

13. Can Newton-Raphson iteration be used to solve f(x) = 0 if fix) = x 2 — \4x + 50? 
Why? 

14. Can Newton-Raphson iteration be used to solve fix) ~ 0 if f(x) = jc 1/3 ? Why? 

15. Can Newton-Raphson iteration be used to solve fix ) = 0 if fix) — (x — 3)^ 2 and 
the starting value is po = 4? Why? 

16. Establish the limit of the sequence in (11)- 

17. Prove that the sequence {p*} in equation (4) of Theorem 2,5 converges to p. Use the 
following steps. 

(a) Show that if p is a fixed point of g(a) in equation (5) then p is a zero of fix). 

(b) If p is a zero of fix) and f'ip) ^ 0, show that g'ip) = 0. Use part (b) and 
Theorem 2,3 to show that the sequence (p*J in equation (4) converges to p. 

18. Prove equation (23) of Theorem 2.6. Use the following steps. By Theorem 1.11, we 
can expand fix) about x = p* to get 
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oince p is a zero or / w, we set x = p and ODtain 

0 = fiPk) + f{p k ){p - Pk) + - pkf- 

(a) Now assume that f(x) ^ 0 for all x near the root p. Use the facts given above 
and f'ipk) / 0 to show that 

„ „ , f(Pk) ~f"iC k ) f 2 

P Pk+ Pk) ' 


approximations f'ipk) * f'ip) and f"(ck) *** f'ip). Now use part (a) to get 

19. Suppose that A is a positive real number, 

(a) Show that A has the representation A = q x l 2m , where 1/4 < q < 1 and m is 
an integer. 

(b) Use part (a) to show that the square root is A 1/2 = q 1 /2 x 2 m . Remark. Let 
po = (2tf -I- l)/3, where 1/4 < q < 1. and use Newton’s formula (11). After 
three iterations, P 3 will be an approximation to q llf2 with a precision of 24 
binary digits. This is the algorithm that is often used in the computer’s hardware 
to compute square roots. 

20. (a) Show that formula (27) for the secant method is algebraically equivalent to 

_ Pk-ifipk) -Pkfipk- 1) 

Pk+] fiPk) - fiPk-0 


computational purposes to the one given in formula (27). 

21. Suppose that p is a root of order M = 2 for fix) = 0. Prove that the accelerated 
Newton-Raphson iteration 

2 /(p ft -i) 

Pk ~ Pk -1 - -777-r 

/'(p*-i) 

converges quadratically (see Exercise 18). 

22. Hatley’s method is another way to speed up convergence of Newton’s method. The 
Hailey iteration formula is 


£(■*) = * 


fix) 

fix ) 


Mrwy’ 

2 ( fix )) 2 ) 


The term in brackets is the modification of the Newton-Raphson formula, Halley ’s 
method will yield cubic convergence (R = 3 ) at simple zeros of fix). 

(a) Start with fix) = x 2 — A and find Halley’s iteration formula g(x) for find¬ 
ing f~A. Use po = 2 to approximate a/5 and compute p \, p 2 , and P 3 . 
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(b) Start with /(jc) = jc j — 3jt + 2 and find Halley’s iteration formula g(jt). Use 
po — -2.4 and compute p \, P 2 , and ps- 


23. A modified Newton-Rapfison method for multiple roots. If p is a root of multiplic¬ 
ity M, then /(Jt) = (jc - p) M q{x), where q{p) ^ 0 . 

(a) Show that h(x) = /(*)//'(*) has a simple root at p. 

(b) Show that when the Newton-Raphson method is applied to finding the simple 
root p of h(x) we get g(jr) = jc — h(x)/h f {x), which becomes 


g(x) — X 


mnx) 

(r(x)) 2 -f( X )r(xy 


(c) The iteration using g(x) in part (b) converges quadratically to p. Explain why 
this happens. 

(d) Zero is a root of multiplicity 3 for the function f{x) = sinfr 3 ). Start with 
pa = 1 and compute pi, p 2 , and p 3 using the modified Newton-Raphson 
method. 


24. Suppose that an iterative method for solving f(x) = 0 produces the following four 
consecutive error terms (see Example 2 . 11 ): Eo = 0.400000, £i = 0.043797, E 2 — 
0.000062, and £3 — 0.000000. Estimate the asymptotic error constant A and the 
order of convergence R of the sequence generated by the iterative method. 


Algorithms and Programs 


1. Modify Programs 2.5 and 2.6 to display an appropriate error message when (i) di¬ 
vision by zero occurs in (4) or (27), respectively, or (ii) the maximum number of 
iterations, maxi, is exceeded. 

2. It is often instructive to display the terms in the sequences generated by (4) and (27) 
(i.e., the second column of Table 2,4). Modify Programs 2.5 and 2,6 to display die 
sequences generated by (4) and (27), respectively. 

3. Modify Program 2.5 to use Newton’s square-root algorithm to approximate each ■ T 
the following square roots to 10 decimal places, 

(a) Start with po = 3 and approximate 

(b) Start with po = 10 and approximate V91 - 

(c) Start with po = -3 and approximate — V8. 

4. Modify Program 2.5 to use the cube-root algorithm in Exercise 11 to approximate 
each of the following cube roots to 10 decimal places. 

(a) Start with po = 2 and approximate 7^ 3 . 

(b) Start with po = 6 and approximate 200 l/3 . 

(c) Start with po = -2 and approximate (-7) 1 /3 . 
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5. Modify Program 2.5 to use the accelerated Newton-Raphson algorithm in Theo¬ 
rem 2.7 to find the root p of order M of each of the following functions, 

(a) fix) = (x — 2) 5 , M ^ 5, p 2; start with po = 1. 

(b) / (x) = sin(;r 3 ), M = 3, p = 0; start with po = 1. 

(c) / U) = (x - 1) ln(jc), M =2 ,p = 1; start with po = 2. 

6. Modify Program 2.5 to use Halley’s method in Exercise 22 to find the simple zero of 
fix) =x 3 ~3x + 2, using p 0 = -2.4. 

7. Suppose that the equations of motion for a projectile are 


y = fit) = 9600(1 - ) - 480r 

jc = r{t) = 2400(1 -e~ t/ls ). 


(a) Find the elapsed time until impact accurate to 10 decimal places. 

(b) Find the range accurate to 10 decimal places. 

8. (a) Find the point on the parabola y — x 2 that is closest to the point (3, 1) accurate 
to 10 decimal places. 

(b) Find the point on the graph of y = sin(jr - sin(x)) that is closest to the point 
(2.1,0.5) accurate to 10 decimal places. 

(c) Find the value of x at which the minimum vertical distance between the graphs 
of fix) = x 2 + 2 and g(x) = (x/5) — sin(x) occurs accurate to 10 decimal 
places. 

An open-top box is constructed from a rectangular piece of sheet metal measuring 1D 
by 16 inches. Squares of what size (accurate to 0.000000001 inch) should be cut from 
the corners if the volume of the box is to be 100 cubic inches? 

10. A catenary is the curve formed by a hanging cable. Assume that the lowest point is 
(0,0); then the formula foT the catenary is y — C cos'n(jr/C) - C. To determine the 
catenary that goes through (±a, b ) we must solve the equation b — C cosh (a/C) -C 
for C. 

($) Show that the catenary through (±10,6) is y = 9.1889 cosh(x/9.1889) — 
9.1889. 

(b) Find the catenary that passes through (±12,5). 
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2.5 Aitken’s Process and Steffensen’s and Muller’s 
Methods (Optional) 

In Section 2,4 we saw that Newton’s method converged slowly at a multiple root and 
the sequence of iterates {p*} exhibited linear convergence. Theorem 2.7 showed how 
to speed up convergence, but it depends on knowing the order of the root in advance. 


Aitken’s Process 

A technique called Aiiken ’f A 2 process can be used to speed up convergence of any 
sequence that is linearly convergent. In order to proceed, we will need a definition. 

Definition 2.6. Given the sequence {p„}^L y , define the forward difference Ap n by 

(1) A p n = p n +\ - p n for n > 0. 

Higher powers A* p n are defined recursively by 

(2) A k p n ~ A* -J (A p n ) for k >2. A 

Theorem 2.8 (Aitken's Acceleration). Assume that the sequence {p n }£L 0 con¬ 
verges linearly to the limit p and that p — p n ^ 0 for all n >0. If there exists a 
real number A with |A| < 1 such that 


O) 


lim 

n-*oc 


P ~ Pn +1 
P- Pn 


- A, 


then the sequence {< 7 n}£L 0 defined by 


(4) 


*?rt = Pn ~ 


(Ap„) 2 
A 2 p„ 


= Pn - 


(Pn+\ ~ Pn) 1 
Pn+2 ~ 2/? n + l + Pn 


converges to p faster than {p„ }^ 0 , in the sense that 


(5) 


lim 


P~ Pn 


= 0 . 


Proof. We wilt show how to derive formula (4) and will leave the proof of (5) as an 
exercise. Since the terms in (3) are approaching a limit, we can write 


,,, P ~ Pn +1 _ , , P ~ Pn+2 . , , 

( 6 ) -% A and -^ A when n is large 

P ~ Pn P — Pn+l 

The relations in ( 6 ) imply that 

(7) (p - pn+ 1) 2 *(p- Pn+2) (P - Pn) ■ 


Table 2.10 Linearly Convergent Sequence {p „} 


n 

Pn 

En = Pn — P \ 

1 . 

a 

l 

1 

0.606530660 

0.039387369 

-0.586616609 

2 

! 0.545239212 

—0.021904079 

-0.556119357 

3 

0.579703095 

0.012559805 

-0.573400269 

4 

0.560064628 

-0.007078663 

-0.563596551 

5 

0.571172149 

0.004028859 

-0.569155345 

6 

0.564862947 

-0.002280343 

-0.566002341 


Table 2.11 Derived Sequence \q n } Using 
Aitken’s Process 


n 

Qn 

Qn~ P 

1 

0.567298989 

0.000155699 

2 

0.567193142 

0.000049852 

3 

0.567159364 

0.000016074 

4 

0.567148453 

0.000005163 

5 

0.567144952 

0.000001662 

6 

0.567143825 

0.000000534 


When both sides of (7) are expanded and the terms p 2 are canceled, the result is 


( 8 ) 


pn+2pn Pn+\ 
Pn+2 ~ 2Pn+l + Pn 


= q n for n = 0 , I. 


The formula in ( 8 ) is used to define the term q n . It can be rearranged algebraically to 
obtain formula (4), which has less error propagation when computer calculations are 
made. • 


Example 2.18. Show that the sequence {p n } in Example 2.2 exhibits linear convergence, 
and show that the sequence ( q „} obtained by Aitken’s A 2 process converges faster. 

The sequence {p„} was obtained by fixed-point iteration using the function g(x) = 
e~ x and starting with po = 0.5. After convergence has been achieved, the limit is P ^ 
0,567143290. The values p„ and q n are given in Tables 2.10 and 2.11. For illustration, the 
value of 41 is given by the calculation 

(P2 - Pi) 2 

qi Pl PI - 2p2 + Pi 

= 0.606530660 - ( ~°^ 1291448 - = 0.567298989. ■ 

0.095755331 




Figure 2.17 The starting approximations pa, p[, and P 2 for Muller’s method, and the 
differences Aq and h i. 


Although the sequence { q n } in Table 2.11 converges linearly, it converges faster 
than {pt,} in the sense of Theorem 2.8, and usually Aitken’s method gives a better 
improvement than this. When Aitken’s process is combined with fixed-point iteration, 
the result is called Steffensen’s acceleration. The details are given in Program 2.7 and 
in the exercises. 

Muller’s Method 

Muller’s method is a generalization of the secant method, in the sense that it doe 
not require the derivative of the function. It is an iterative method that requires three 
starting points ( po , /(po)), (pu /(Pi)), and (p 2 , f(pi ))• A parabola is constructed 
that passes through the three points; then the quadratic formula is used to find a rooi 
of the quadratic for the next approximation. It has been proved that near a simple 
root Muller’s method converges faster than the secant method and almost as fast a 
Newton’s method. The method can be used to find real or complex zeros of a function 
and can be programmed to use complex arithmetic. 

Without loss of generality, we assume that pi is the best approximation to thv 
root and consider the parabola through the three starting values, shown in Figure 2.17 
Make the change of variable 

(9) t=x-p 2 , 

and use the difference 

(10) ho = po - p 2 and h\ = p\ - pi. 

Consider the quadratic polynomial involving the variable t: 

y = at 2 + bt + c. 


(ID 
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Each point is used to obtain an equation involving a, b, and c: 

At t = ho: ah$ + bho + c — fo, 

(12) At t = h\: ah 2 ! A bh\ -V c = jy, 

At t — 0; <z0 2 + bQ + c — f 2 . 

From the third equation in (12), we see that 

(13) c = f 2 . 

Substituting (13) into the first two equations in (12) and using the definition eo = fo~c 
and e\ = fi — c results in the linear system 

+ bho = fo - c = e o, 
ah\ bh\ =. f\ — c = e\. 

Solving the linear system for a and b results in 

_ eph\ - e\hp 
h\hi-hoh\ 

(15) J J 

b _ eihl~eoh\ 

hihl - hoh\ 

The quadratic formula is used to find the roots t = z\ , zi of (11): 


Formula IT 61 is en 


ivalent to the standard formula for the roots 


better in this case because we know that c = f 2 - 

To ensure stability of the method, we choose the root in (16) that has the smallest 
absolute value. If b > 0, use the positive sign with the square root, and if b < 0, use 
the negative sign. Then pi is shown in Figure 2.3 7 and is given by 

(17) pi = p 2 + Z. 

To update the iterates, choose po and p\ to be the two values selected from among 
{po. pi. pi] that lie closest to p 2 (i.e., throw out the one that is farthest away). Then re¬ 
place py^ith p 3 , Although a lot of auxiliary calculations are done in Muller’s method, 
it only requires one function evaluation per iteration. 

If Muller’s method is used to find the real roots of /(x) = 0, it is possible that 
one may encounter complex approximations, because the roots of the quadratic in (16) 
might be complex (nonzero imaginary components). In these cases the imaginary com¬ 
ponents will have a small magnitude and can be set equal to zero so that the calculations 
proceed with real numbers. 
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Table 2.12 Comparison of Convergences near a Simple Root 


k 

Secant 

method 

Muller’s 

method 

Newton’s 

method 

Steffensen 
with Newton 

0 

-2.600000000 

-2.600000000 

-2.400000000 

-2.400000000 

1 

-2.400000000 

-2.500000000 

-2.076190476 

-2.076190476 

2 

-2.106598985 

-2.400000000 

-2.003596011 

-2.003596011 

3 

-2.022641412 

—1.98527528? 

-2.000008589 

-1,982618143 

4 

-2,001511098 

-2.000334062 

-2.000000000 

-2.000204982 

5 

-2.000022537 

-2.00000Q2I8 


-2.000000028 

6 

-2.000000022 

-2.000000000 


-2.000002389 

7 

-2,000000000 



-2.000000000 


Comparison of Methods 

Steffensen’s method can be used together with the Newton-Raphson fixed-point func¬ 
tion g(r) = x — /(*)//'(*). In the next two examples we look at the roots of 
the polynomial /( x) = x 3 - 3x + 2, The Newton-Raphson function is g(r) = 
(2r 3 - 2)/(3r 2 “ 3). When this function is used in Program 2.7, we get the calcula¬ 
tions under the heading Steffensen with Newton in Tables 2.12 and 2.13. For example, 
starting with po = -2.4, we would compute 

(18) pi = g(po) = -2.076190476, 
and 

(19) p 2 = g(p\) = -2.003596011. 

Then Aitken’s improvement will give pj — -1.982618143. 

Example 2.19 (Convergence near a Simple Root). This is a comparison of methods 
for the function f(x)=x 1, — 3x + 2 near the simple root p = -2. 

Newton’s method and the secant method for this function were given in Examples 2.14 
and 2.16, respectively. Table 2.12 provides a summaiy of calculations for the methods. ■ 

Example 2.20 (Convergence near a Double Root). This is a comparison of the methods 
for the function f{x) = x 3 — 3x + 2 near the double root p = 1. Table 2.13 provides li 
summary of calculations. w 

Newton’s method is the best choice for finding a simple root (see Table 2.12). AT L ; 
double root, either Muller’s method or Steffensen’s method with the Newton-Raphson 
formula is a good choice (see Table 2.13). Note in the Aitken’s acceleration formula (4 1 
that division by zero can occur as the sequence {/?*} converges. In this case, the Iasi 
calculated approximation to zero should be used as the approximation to the zero of j . 
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Table 2.13 Comparison of Convergence Near a Double Root 


k 

Secant 

method 

Muller’s 

method 

Newton’s 

method 

Steffensen 
with Newton 

0 

1.400000000 

1.400000000 

1.200000000 1 

1,200000000 

1 

1.200000000 

1.300000000 

1.103030303 1 

1.103030303 

2 

1,138461538 

1,200000000 

1.052356417 j 

1.052356417 

3 

1.083873738 

1.003076923 

1.026400814 

0,996890433 

4 

1.053093854 

1.003838922 

1.013257734 

0.998446023 

5 

1.032853156 

1.000027140 : 

1.006643418 

0.999223213 

6 

1.020429426 

0.999997914 

1.003325375 

0.999999193 

T 

1,012648627 

0.999999747 

1.001663607 

0.999999597 

8 

1.007832124 

1.000000000 

1.000832034 

0.999999798 

9 

1.004844757 


1.000416075 

0.999999999 


In the following program the sequence tp*), generated by Steffensen’s method 
with the Newton-Raphson formula, is stored in a matrix Q that has maxi rows and 
three columns. The first column of Q contains the initial approximation to the root, 
po, and the terms ps, ..., p 3 * f ... generated by Aitken’s acceleration method (4). 
The second and third columns of Q contain the terms generated by Newton’s method. 
The stopping criteria in the program are based on the difference between consecutive 
terms from the first column of Q . 


Program 2.7 (Steffensen’s; Acceleration). To quickly find a solution of the fixed- 
point equation x = g(x) given an initial approximation po; where it is assumed 
that both g(r) and g'Cr) are continuous, |g'(x)[ < 1, and that ordinary fixed-point 
iteration converges slowly (linearly 1 ) to p. 

function [p,Q]*steff (f ,df jpOjdeltajepsilon^Biaxl) 

Jilnput - f is the object function input as a string ’f* 

% - df is the derivative of f input as a string Mf * 

% - pO is the initial approximation to a zero of f 

51 - delta is the tclerqncs for pO 

7, - epsilon is the tolerance for the function values y 

% - maxi is the maximum number of iterations 

^Output - p is the Steffensen approximation to the zero 
% - Q is the matrix containing the Steffensen sequence 

‘^Initialize the matrix R 
R^zeroS(maxi,3): 
ft{l,l)-p0; 
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for k“l:maxl 
for j°2:3 

^Denominator in Nevton-Raphson method is calculated 
nrdenom-faval(df, R (k, j -1) >; 

^Calculate Newton-Raphson approximations 
if nrdenom*-0 

’division by zero in Jfevton-P^apfcson method’ 
break 
else 

R(k, j)=R(k, j-l)-fevalCf ,R(k, j-l) )/nrdenom; 

end 

^Denominator in Aitken’s Acceleration process calculatt 
aadenon=R(k,3)-2*R(k,2)+R(k, 1); 

'/.Calculate Aitken’s Acceleration approximations 
if aadenom»0 

’division by zero in Aitken’s Acceleration’ 
break 
else 

R(k+i, l)»R(k, l)-(R(k,2)-R(k, 1)) '’2/aadenom; 

end 

end 

%End program if division by zero occurred 
if [nrdencm-^O)t(aadenom==0) 
break 

end 

'/.Stopping criteria are evaluated 
err*abs(R(k,1)-R(k+1,1)); 
relerr=err/(abs(R(k+l,l>)+delta); 
y=feval(f ,R(k+l p 1)J; 

if (err<delta)I(relerr<delta)I(y<epsilon) 

% p and the matrix 3 are determined 
p-R(k+l.l); 

Q-RC :k+l, ; 

break 

end 


end 



Program 2.8 (MulJer’s Method). To find a root of the equation f { x ) = 0 given 
three distinct initial approximations po, p\ , and pi. 

function [p,y,err]-muller(f,pO.pl,p2,delta epsilon,maxi) 

'/.Input - 1 is the object function input as a string ’ f ’ 

7 - pO, pi, and p2 are the initial approximations 

K - delta is the tolerance for pO, pi, and p2 

% - epsilon the the tolerance for the function values y 

% - maxi is the maximum number of iterations 

7,Output - p is the Muller approximation to the zero of f 
- y is the. function value y * f (p) 

7; - err is the error in the approximation of p, 

•/.Initialize the matrices P and Y 
P=[p0 pi p2] ; 

Y*feval(f,P); 

/.Calculate a and b in formula (15) 
for k=l:maxl 

hO=P(l)-P(3) ;hl-P(2) -P(3);eO-Y(l)'Y(3);al-Y(2>-Y(3);c=Y(3 ); 

denom : =hl*hO“2-hO* t hl“2; 

a=(aO*nl-ei*h.O)/denom; 

b=(el+h0"2-e0*hl' , 2) /denom; 

'/.Suppress any complex roots 
if b'2-4+a*c > 0 

disc*sqrt(b~2-4*a*c); 
else 

disc-0; 

end 

‘/.Find the smallest root of (17) 
if b < 0 

disc—disc; 

end 

z=-2*c/ fb+disc); 
p=P(3)+z; 

‘/.Sort the entries of P to find the two closest to p 
if abs(p-F(2))<abs(p-P(1)) 
q%>(2) PCI) PC3)]; 

P=Q; 

Y-feval(f,P); 

end 

if abs(p-P(3))<abs(p-P(2)) 

R=[P(1) P(3) P(2)]; 

P-R; 
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Y-fevalCf,P)i 

end 

'/♦Replace the entry of P that was farthest from p with p 
P(3)=p; 

Y(3) = feval{f,P(3)); 
y=Y(3); 

JtDetflnnine stopping criteria 

err=abs(z); 

relerr=err/(abs(p)+delta); 

if (err<delta)|(relerr<delta)I<abs(y)<epsilon) 
break 

end 

end 


Exercises for Ait ken's, SteffenseiTs, and Muller's Methods _ 

1. Find Ap„, where 

(a) p n =5 (b) p„ = 6n + 2 (c) p„~n(n + 1) 

2. Let p n = 2 n 2 4- 1. Find A k p n , where 

(a) k = 2 (b) * = 3 (c) k = 4 

3. Let p„ — 1 /2”. Show that q n — 0 for all n, where q„ is given by formula (4). 

4. Let p n = l/n. Show that q n = l/(2w 4- 2) for all n; hence there is little acceleration 
of convergence. Does [p„} converge to 0 linearly? Why? 

5. Let Pn = 1/(2" - 1), Show that? B = 1/ (4 rt +* - 1 ) for all n. 

6. The sequence p„ = 1/(4” 4- 4~ n ) converges linearly to 0. Use Aitken’s formula (4) 
to find q i , < 72 , and < 73 , and hence speed up the convergence. 



7. The sequence {p„} generated by fixed-point iteration starting with p 0 = 2.5 and using 
the function g(r) = (6 + jr ) l ^ 2 convenes linearly to p = 3. Use Aitken’s formula 
(4) to find q j, qi, and q 3 * and hence speed up the convergence. 
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8 . The sequence {p n } generated by fixed-point iteration, starting with po = 3.14, and 
using the function g(jr) = ln(jr) + 2 converges linearly to p as 3.1419322. Use 
Aitken’s formula (4) to find q \, qi, and < 73 , and hence speed up the convergence. 

9. For the equation cos(x) — 1 = 0, the Newton-Raphson function is g(jr) = jt - (1 — 
cos(jf))/ sin(x) = x — tan(jc/2). Use Steffensen’s algorithm with g(,v) and start with 
po — 0,5, and find pi, p 2 , and py then find pA, P 5 , and pt i- 

10. Convergence of series. Aitken’s method can be used to speed up the convergence of 
a series. If the nth partial sum of the series is 

S n = Ajt, 

fc -1 

show that the derived series using Aitken's method is 



14. — £" =I ^ 

15. Use Muller’s method to find the root of f(x) = x 3 - x - 2, Start with po ~ ] ,0, 
p 1 = 1.2, and P 2 — 1.4 and find p 3 , pn , and ps. 

16. Use Muller’s method to find the root of /(x) = 4x 2 - t x . Stan with po — 4 . 0 . 
p\ =4.1, and P 2 = 4.2 and find p 3 , p 4 , and pj. 

17. Let {p„\ and [q n ] be any two sequences of real numbers. Show that 

(a) A (p„+q„) = Ap n + A q n 

(b) A {p n q n ) = p n + 1A q n + q n A p n 

18. Start with formula (8), add the terms p n + 2 and -p n +2 to the right side, and show that 
an equivalent formula is 


P 


(Pn+2 - P/t+l ) 1 

Pn +2 ~ - ---- = q n - 

Pn+2 ~~ 2p n+ \ -f pn 


19. Assume that the error in an iteration process satisfies the relation E n+] — KE„ for 
some constant K and |tf| < 1 . 

(a) Find an expression for E n that involves Eq, K , and n. 

(b) Find an expression for the smallest integer N so that [5jvl < 10“ 8 . 
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Algorithms and Programs 


1. Use Steffensen’s method with the initial approximation po = 0.5 to approximate the 
zero of f{x) = x — sin(x) accurate to 10 decimal places. 

2. Use Steffensen’s method with the initial approximation po = 0-5 to approximate the 
zero of /(jt) = sin{r 3 ) closest to 0.5 accurate to 10 decimal places. 

3. Use Muller’s method with the initial approximations po = 1.5, p\ = 1.4, and 
P 2 = 1.3 to find a zero of /(jc) = 1 4- 2x — tan(x) accurate to 12 decimal places. 

4. In Program 2.8 (Muller’s method) a 1 x 3 matrix P is initialized with po, pi, and pi. 
Then at the end of the loop, one of the values po, pi, or p 2 is replaced with the new 
approximation to the zero. This process is continued until the stopping criteria are 
satisfied, say at k = K. Modify Program 2,8 so that, in addition to p and err , a 
(AT + 1) x 3 matrix Q is produced such that the first row of Q contains the 1 x 3 
matrix P with the initial approximations to the zero, and the fcth row of Q contains 
the &th set of three approximations to the zero. 

Use this modification of Program 2.8 with the initial approximations po = 2.4, 
pi = 2.3, and pj = 2,2 to find a zero of /(jc) = 3cos(x) + 2sin(jc) accurate to 
8 decimal places. 
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The Solution of Linear Systems 
AX = B 


Thiee planes form the boundary of a solid in the first octant, which is shown in Fig¬ 
ure 3 . 1 . Suppose that the equations for these planes are 

5* + y + z = 5 
JT + 4y 4- Z = 4 
* + y-f-3z = 3. 

What are the coordinates of the point of intersection of the three planes? Gaussian 
elimination can be used to find the solution of the linear system 

x — 0.76, y = 0.68, and z — 0.52. 

In this chapter we develop numerical methods for solving systems of linear equations. 


3.1 Introduction to Vectors and Matrices 

A real jV-dimensional vector a is an ordered set of N real numbers and is usually 
written in the coordinate form 

X= (jci, * 2 , ■ ...*//). 

Here the numbers x lt X 2 , ..., and x N are called the components of X. The set con¬ 
sisting of all A-dimensional vectors is called N-dimensional space. When a vector is 
used to denote a point or position in space, it is called a position vector . When it is 
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Figure 3,1 The intersection of three planes 


used to denote a movement between two points in space, it is called a displacement 
vector. 

Let another vector be Y = (yj, y 2 ,..., yw)- The two vectors X and Y are said to 
be equal if and only if each corresponding coordinate is the same; that is, 

(2) X = Y if and only if Xj — yj for j = 1, 2, .... N. 

The sum of the vectors X and Y is computed component by component, using the 
definition 

(3) X + Y = (jci -f yi f Jt2 + y2. x N +y N ). 

The negative of the vector X is obtained by replacing each coordinate with its 
negative: 

(4) -X = (-X ], -- ~xn)- 

The difference Y - X is formed by taking the difference in each coordinate: 


(5) 


Y - X — (y\- x\,y 2 - x 2 . yti - x N ). 
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Vectors in Af-dimensional space obey the algebraic property 

(6) Y — X — Y + (~X). 

If c is a real number (scalar), we define scalar multiplication cX as follows: 

(7) cX = (CJC], CX 2 , . . CXn). 

If c and d are scalars, then the weighted sum cX +dY is called a linear combina¬ 
tion of X and Y, and we write 

^ +dY = (ex] -r dy\,cx 2 + dy 2t ..., cjc/v + dyx). 

The dot product of the two vectors X and Y is a scalar quantity (real number) 
defined by the equation 

^ X ■ Y = *13T + x 2 y 2 H-1 ~x N y N . 

The norm (or length ) of the vector X is defined by 

< ! °) m = <*?+*! +■■‘+4) ,/2 - 

Equation (10) is referred to as the Euclidean norm (or length) of the vector X. 

Scalar multiplication cX stretches the vector X when ;c| > 1 and shrinks the 
vector when Jc| < 1. This is shown by using equation (10): 

Jk^ll = (C 2 x\ + c 2 xl H- \-c 2 x 2 N ) ]j2 

= |c|{jrf + x\ + ■■■ + 4)'/ 2 = tcIHATII. 

An important relationship exists between the dot product and norm of a vector. If 
both sides of equation (10) are squared and equation (9) is used, with Y being replaced 
with X, we have 

W\ 2 = xf + x\ + ■ • - + x 2 n = X - X. 

If X and Y are position vectors that locate the two points (xj, x 2 ,..., x*) and 
fyi, , yv) i» V-dirnensional space, then the displacement vector from X to Y 
is given by the difference 

(13) Y — X (displacement from position X to position T). 

Notice that if a particle starts at the position X and moves through the displacement 
y ~ X its new position is Y. This can be obtained by the following vector sum: 

04) t} f{ Y = X + (Y-X). 

Using equations ( 10 ) and (13), we can write down the formula for the distance 
between two points in N-space. 

(15) ||y - XII = ((y, - + (y 2 - * 2 )2 x N ) 2 )'' 2 . 

When the distance between points is computed using formula (15), we say that the 
points lie in N-dimensional Euclidean space. 
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(22) X - X = X + (-,¥) = 0 additive inverse 

(23) (X + Y) + Z = X + (Y + Z) associative property 

,(24) (t i + b)X = aX + bX distributive property for scalars 

(25) a(X + F) — aX +aY distributive property for vectors 

(26) a(bX) = (ab)X associative property for scalars 


Matrices and Two-dimensional Arrays 

A matrix is a rectangular array of numbers that is arranged systematically in rows and 
columns. A matrix having M rows and N columns is called an M x N (read “M by N”) 
matrix. The capital letter A denotes a matrix, and the lowercase subscripted letter a fj 
denotes one of the numbers forming the matrix. Wc write 

(27) A = [a lj ] AixN for 1 <i < Af, 1 <j < N, 

where a,-j is the number in location (i, j) (i.e., stored in the /th row and /th column 
of the matrix). We refer to as the element in location (/, j). In expanded form we 
write 



an 

a 12 *■ 

a\j 

•” a\N 


&2\ 

022 

aij 

aiN 

(28) 

row i a\ \ 

an ■ ■ 

Qij 

■ ■ ■ ain 


Jim 

aMi ■ ‘ 

a Mj 

■ ■ ■ 


T 

column j 


The rows of the M x N matrix A are A-dimensional vectors: 

(29) Vi =(fl/i,a ,-2 . a iN ) for / = 1, 2, M. 

The row vectors in (29) can also be viewed as 1 x N matrices. Here we have sliced 
the M x N matrix A into M pieces (submatrices) that are 1 x N matrices. 

In this case we could express A as an M x 1 matrix consisting of the 1 x N row- 
matrices Vi ; that is, 

"V,“ 

V 2 

A= = [V i V 2 Vi ■■■ V„]'. 

V M 


(30) 
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Similarly, the columns of the M x N matrix A are M x 1 matrices: 



an 

a u 



a 2 ] 

a y 

N 

(31) 

Ci= : , 

an 

.... Cj= : , . 

J a 0 

■ c N = : 



GMj 

a MN 


In this case we could express A as a 1 x N matrix consisting of the M x 1 column 
matrices Cj : 

(32) A = [C t C 2 Cj ... ty|. 

Example 3 . 2 , Identify the row and column matrices associated with the 4 x 3 matrix 

~-2 4 9 

A- 5 ~ 7 1 

0-3 S 

-4 6 —5_ 

The four row matrices are V] = [-2 4 9], V 2 = [5 -7 l], V 3 ~ [0 -3 8 ], 
and V 4 = [—4 6 —5]. The three column matrices are 


and C 3 = 


Notice how A can be represented with these matrices: 

~V\ 

A = v\ =[ C ' C 2 C 3 ]. 


Let A — lajj] MxN and B — [bij~[ MxN be two matrices of the same dimension. 
The two matrices A and B are said to be equal if and only if each corresponding 
element is the same; that is t 

(33) A = B if and only if = b tJ for 1 < i < M, 1 < j < jv. 

The sum of the two M x N matrices A and B is computed element by element, 
using the definition 


A + B = [dij + bij] MxN for 1 < i < M, l < j < N 
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The negative of the matrix A is obtained by replacing each element with its nega¬ 
tive: ' 

(35) - A = [for I < i < M, 1 < j < N. 

The difference A — B is formed by taking the difference of corresponding coordi¬ 
nates: 

(36) A - B = [a t j - bij] MxN for 1 < i < M, 1 < j < N. 

If c is a real number (scalar), we define scalar multiplication cA as follows: 

(37) cA = [ca ( j] MxN for 1 < / < M, 1 < j < N. 

If p and q are scalars, the weighted sum pA + qB is called a linear combination 
of the matrices A and B, and we write 

(38) pA + qB — [paij + qbij] MxN for 1 5 * 5 A#, 1 < > < iV 
The zero matrix of order M x N consists of all zeros: 

(39) 0 = [0! Wx - 



Theorem 3.2 (Matrix Addition). Suppose that A, B, and C are M x N matrices 
and p and q are scalars. The following properties of matrix addition and scalar multi¬ 
plication hold: 

(40) B 4- A = A + B commutative property 

(41) 0 + A = A 4- 0 additive identity 

(42) A - A = A + (“A) = 0 additive inverse 

(43) (A + B) -f C = A + (B + C) associative property 

(44) (p + q)A = pA + qA distributive property for scalars 

(45) p(A + B) = pA + pB distributive property for matrices 

(46) p(qA) = (pq)A associative property for scalars 
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Exercises for introduction to Vectors and Matrices 

The reader is encouraged to carry out the following exercises by hand and with MATLAB. 

1. Given the vectors X and Y , find (a) X T Y, (b) X - Y, fc) 3X (d) )| X\\, («) 1Y - AX, 
(0 X • V, and (g> ]\1Y - AXt 

(i) X = (3, -4) and Y = (— 2 , 8 ) 

(if) X =■ (- 6 , 3, 2) and Y ~ (- 8 , 5, 1) 
tiiii x = (4,-8, i) and Y = { 1 , -12, - 11 ) 

(iv) X = ( 1 , -2,4,2) and Y = ( 3 , -5, - 4 , 0) 

2. Using the law of cosines, it can be shown that the angle 9 between two vectors X anti 
y is given by the relation 


cos((?) = 


mm' 


Find the angle, in radians, between the following vectors: 

(a) X = ( -6,3. 2) and Y = (2, -2, I ) 

(b) X = (4,-S, DandV =(3,4, 12) 

3. Two vectors X and Y are said to be orthogonal (perpendicuiar) if the angle between 
them is rr /2 

(a) Prove that X and Y are orthogonal if and only if X ■ Y = 0. 

Use part (a) to determine if the following vectors are orthogonal. 

<bj X = (-6,4, 2) and Y = (6,5,8) 

(c) X = (-4, 8 , 3) and Y = (2, 5. J 6 ) 

(d) X = (-5, 7,2) and T = <4, 1,6) 

(e) Find two different vectors that are orthogonal to X — (1, 2, —5), 

4. Find (a) A + B, (b) A - B, ana (c) 3 A — IB for the matrices 


-1 9 4 

2 -3 -6 
0 5 7 


r - 4 9 

B = 3 -5 


5, The transpose of an M x A’ matrix A, denoted A\ is the N x M matrix obtained 
from A by converting the rows of A to columns of A 1 . That is, if A — [tj rf ),, . ami 


bji - ii ;J 


\ < i < M, 


Find the transpose of the following matrices, 

~~2 5 t2l 

(a) \ J ~l (h 

11-3 8 


4 9 2 
3 5 7 
8 1 6 
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6 . The square matrix A of dimension N x N is said lo be symmetric if A = A! (see 
Exercise 5 for the definition of A'). Determine whether the following square matrices 
arc symmetric, 

r 1 -7 4] [4 -7 f 

(») -7 2 0 (b) 0 2 -7 

4 0 3 3 0 4 


(c) A = [flyJ Wx jv» where 






(d) A - taijlNx/v^whereoij = {^ uavy '' , \ \ 

7. Prove statements (20), (24). and (25) in Theorem 3.1, 


L2 Properties of Vectors and Matrices 

A linear combination of the variables x\,xj . x N is a sum 

( 1 ) a\X\+aixi-\ - onxn 

where a* is the coefficient of x* for k = 1 T 2 , .... ;V. 

A linear equation in xj, X 2 , .. ■, Jt/v is obtained by requiring the linear combination 
in (I) to take on a prescribed value b; that is, 

( 2 ) aix\ 02*2 - +a^xy=b. 

Systems of linear equations arise frequently, and if M equations in N unknowns 
are given, we write 

x-G]2X2 H-haiA'XA’ ~ hi 

+ <* 22^2 +- \-a2NXps =t>2 

(3) : : : 

+a* 2X2 A- -hiife/vfjr.v =b k 


-\-aM2X7 +-h dMKXfli = by. 

To keep track of the different coefficients in each equation, it is necessary to use the 
two ^scripts (k, j). The first subscript locates equation k and the second subscript 
locates the variable x j . 

A solution to (3) is a set of numerical values x|, X 2 , ..., that satisfies all the 
equations in (3) simultaneously. Hence a solution can be viewed as an A-dimensional 
vector: 


(4) 


A' = (jc],x 2 . xn) 





Example 3.4. Concrete (used for sidewalks, etc.) is a mixture of portland cement, sand, 
and gravel. A distributor has three batches available for contractors. Batch 1 contains ce¬ 
ment, sand, and gravel mixed in the proportions 1/8, 3/S, 4/8; batch 2 has the proportions 
2/10, 5/10,3/10; and batch 3 has the proportions 2/5, 3/5, 0/5, 

Let xi, X 2 , and x 3 denote the amount (in cubic yards) to be used from each batch to 
form a mixture of 10 cubic yards. Also, suppose that the mixture is to contain b 3 = 2.3, 
b 2 = 4.8, and b 3 = 2,9 cubic yards of portland cement, sand, and gravel, respectively. 
Then the system of linear equations of the ingredients is 

0.125.xj -V 0.200x2 + 0.400x3 = 2.3 (cement) 

(5) 0.375xi + 0.500x2 -I- 0.600x3 — 4.8 (sand) 

0.500xi + 0,300x2 + 0.000x3 = 2.9 (gravel) 

The solution to the linear system (5) is x 5 = 4, x 2 - 3, and *3 = 3, which can be verified 
by direct substitution into the equations- 


(0.125) (4) + (0.200)(3) -f- (0,400)(3) = 2.3 
(0375) (4) + (0.500) (3) + (0.600) (3) = 4.8 


Matrix Multiplication 

Definition 3.1, If A = lu^Lwxjv and B = [bjtyLvx/ 1 are two matrices with the 
property that A has as many columns as B has rows, then the matrix product AB is 
defined to be the matrix C of dimension M x P: 

( 6 ) A 8 = C = [ Cii l MxP , 

where the element aj of C is given by the dot product of the i th row of A and the y'th 
column of B: 

s 

( 7 ) C{j = ^^aixbkj = &i\b\j + < 3 / 2 ^ 2 ; + ■ ■ ■ +au<ibtfj 

k^\ 

for i = 1, 2,..., M and j = 1, 2,..., P , « 

Example 3.5. Find the product C = AB for the following matrices, and tell why BA is 
not defined. 



The matrix A has two columns and B has two rows, so the matrix product AB is 
defined. The product of a 2 x 2 and a 2 x 3 matrix is a 2 x 3 matrix. Computation reveals 
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2 31 j 5 -2 1] 

-1 4_||_3 8 -6] 

10 + 9 -4 + 24 2- 18 

-5 + 12 2 + 32 -1 - 24 


19 20 -16 
7 34 -25 


When an attempt is made to form the product BA , we discover that the dimensions are 


rirri*r 


columns of A are two-dimensional vectors. Hence the dot product of the y'th row of B and 
the fcth column of A is not defined. ■ 

If it happens that AB = BA, we say that A and B commute. Most often, even 
when AB and B A are both defined, the products are not necessarily the same. 

We now discuss how to use matrices to represent a linear system of equations. 
The linear equations in (3) can be written as a matrix product. The coefficients a k} 
are stored in a matrix A (called the coefficient matrix) of dimension M x N, and the 
unknowns xy are stored in a matrix X of dimension N x 1 . The constants b k are stored 
in a matrix B of dimension M x 1. It is conventional to use column matrices for both 
X and B and write 


<*n 

a u ■ 

■' a u • 

• ’ atN 


an 

1 222 - 

■■ a 2J ■ 

• ■ a 2 N 

*2 

&k\ 

&ki ■ 

■ ‘ a kj ■ 

• • &kN 

x i 

Qm\ 

&M2 ■ 

■ • a.) i* * ■ 

■■ omn_ 



The matrix multiplication AX = B in ( 8 ) is reminiscent of the dot product for 
ordinary vectors, because each element bjt in B is the result obtained by taking the dot 
product of row k in matrix A with the column matrix X . 

Example 3.6. Express the system of linear equations (5) in Example 3.4 as a matrix 
product. Use matrix multiplication to verify that [4 3 3]' is the solution of (5); 

“0.125 0.200 0.4001 [xH [2.3“ 

(9) 0.375 0.500 0.600 x 2 = 4,8 . 

0.500 0.300 O.OOOj [x 3 J |_2.9^ 

To verify that [4 3 3]' is the solution of (5), we must show that A [4 3 3]' = 
[2.3 4.8 2.9]': 


0.125 0.200 0.400 
0.375 0.500 0.600 
0.500 0.300 0.000 


0.5 + 0.6+1.21 [2.3 

1.5+ 1.5+ 1.8 = 4.8 
2 . 0 + 0.9+ 0.0 2.9 
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Some Special Matrices 

The M x N matrix whose elements are all zero is called the zero matrix of dimen¬ 
sion M * N and is denoted by 

( 10 ) 0 = fOJ.Vy.Ai. 

When the dimension is clear, we use 0 to denote the zero matrix. 


/.V = [^oJ.JVvA/ wherc &ij = 


0 when i -f j. 


]f is the multiplicative identity, as illustrated in the next example. 

Example 3.7, Let A be a 2 x 3 matrix. Then A ~ A /4 = A. Multiplication of A on 
the left by 1 2 results in 

fl 0 " "u N an * l3 l_ra „+0 12 - 1-0 un +0 

j_0 1 cn 1 an «23j — j_tf:i+G a 22 + 0 

Multiplication of A on the right by 1 2 results in 


U|1 U12 'H3 

ai\ a 22 @23 


1 0 01 
0 10 = 


!1 + 0 J -0 0 + A12+0 0+0 = U]3 
21+0 — 0 0 + <122 + 0 0 + 0 + «23 


Some properties of matrix multiplication are given in the following theorem. 


and C are matrices such thar the indicated sums and products are defined: then 

(12) ( AB)C — A(BC) associativity of matrix multiplication 

(13) IA - AI — A identity matrix 

(14) A ( B + C) — A B + AC left distributive property 

(15) {4 + #){’ - AC + BC right distributive property 

(16) r(Afl) = ( cA)B = A(cB) scalar associative property 


The Inverse of a Nonsingular Matrix 

The concept of an inverse applies to matrices, hut special attention must be given._.u 
N x N matrix A is called nonsingular or invertible if there exists an /V x N matrix R 
such that 


(17) 


AB =? BA — /. 
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If no such matrix B can be found, A is said to be singular. When B can be found 
and (17) holds, wc usually write B = A~ ] and use the familiar relation 

(18) AA' ] =A 5 A if A is nonsingular. 

It is easy to show that at most one matrix B can he found that satisfies relation (17). 
Suppose that C is also an inverse of A (i.e., AC = CA = I). Then properties (12) 
and (13) can be used to obtain 


C - /C = ( BA)C = S(AC) = Bi = B. 


Determinants 

The determinant of a square matrix A is a scalar quantity (real number) and is denoted 
bydet(A) or |Aj, If Aisatf x N matrix 

["ail ai2 
A | ^21 ^22 

&N\ 

then it is customary to write 

"11 A]2 aiA' 

^21 «22 ‘ ‘ ‘ a lN 

det(A) j= , . 

a VI 4.V2 a VV 

Although the notation for a determinant may look like a matrix, its properties are com¬ 
pletely different. For one, the determinant is a scalar quantity (real number). The 
definition of det(A) found in most linear algebra textbooks is not tractable for compu¬ 
tation when N >3. We will review how to compute determinants using the cofactor 
expansion method. Evaluation of higher-order determinants is done using Gaussian 
elimination and is mentioned in the body of Program 3.3. 

if A — [ 3 ^] is a ! k ! matrix, we define dct(A) = £j 13 . If A — [u/fjvxv, where 
Af > 2 then let be the determinant of the N lx A? - 1 submatrix of A obtained 
by deleting the ith row and yth column of A. The determinant M- tJ is said to be tfic 
minor of aij. The cofactor Ai s of a t} is defined as A^ = M t j. Then the 

determinant of an N x N matrix A is given by 

s 

(19) det(A) = ^n r/ A (/ O' th row expansion) 

J=i 




Oivl 

«2 V 

flvv 



114 char 3 the Solution of Linear Systems AX-B 



Theorem 3.4. Assume that A is an N x N matrix. The following statements are 
equivalent. 

(21) Given any N x 1 matrix B, the linear system AX = B has a unique solution. 

(22) The matrix A is nonsingular (i.e., exists). 

(23) The system of equations AX = 0 has the unique solution A: = 0. 

(24) det(A) 0. 
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Theorems 3.3 and 3.4 help relate matrix algebra to ordinary algebra. If state¬ 
ment (21) is true, then statement (22) together with properties (12) and (13) give the 
following line of reasoning: 

(25) AX = B implies A -1 AX = A" 1 ®, which implies X=A~' > B. 
Example 3.9. Use the inverse matrix 


-i_ 1 r 4 

1-7 


f 

3_ 


and the reasoning in (25) to solve the linear system AX = B\ 


AX 


3 1 
7 4 


*1 

*2 



Using (25), we get 


r 4 - 1 !F 2 i_ir 3 i-r°- 6 i 

5 L-7 3j [_5j ~ 5 L~ Lo-2_ ’ 


Remark. In practice we never numerically calculate the inverse of a nonsingular 
matrix or the determinant of a square matrix. These concepts are used as theoretical 
“tools” to establish the existence and uniqueness of solutions or as a means to alge¬ 
braically express the solution of a linear system (as in Example 3.9). 


Plane Rotations 

Suppose that A is a 3 x 3 matrix and V — jx y z\ is a 3 x ! matrix; then the product 
V — AJJ is another 3x1 matrix. This is an example of a linear transformation, and 
applications are found in the area of computer graphics. The matrix U is equivalent 
to the positional vector V = (a, y, z)> which represents the coordinates of a point in 
three-dimensional space. Consider three special matrices; 

"10 0 

(26) R x (ot) — 0 cos(or) — sin(a) 

_0 sin(a) cos(a) 

cos (jt) 0 sin (£) 

(27) R y (fi) =010, 

- sin(/?) 0 ■ cos(jtf) 


Rz(y) 


cos(y) — sin(y) 0 
sin(y) cos(y) 0 . 
° 0 1 _ 


( 28 ) 



A V 


Table 3-1 Coordinates of the Vertices of a Cube under Successive Rotations 


V = R Z {%)U 


(0, 0, G)' 

</o, 0 / 
( 0 , 1 , 0 / 
(o, o, iy 
( 1 , 1 , 0 )' 
0,0. iy 
(o. i, iy 

a.i.iy 


( 0 . 000000 , o, 

(0.707107, 0. 
(—0.707107, 
( 0 . 000000 . 0 . 
( 0 . 000000 , 1 . 
(0.707107, 0, 
(-0.707107, 
(0.000000, 1 


. 000000 , 0 )' 
.707107, 0)' 
0.707107, 0)' 
. 000000 , 1 / 
.414214, 0)' 
.707107,1)' 
0.707107, 1/ 
.414214,1)' 


( 0 . 000000 , 0 . 
(0.612372,0. 
(—0.612372,. 
(0.500000, O.i 
( 0 . 000000 , 1 , 
(1.112372, 0. 
(-0.112372, 1 
(0.500000. 1, 


. 000000 , 0 
.707107, - 
0.707307, 
000000 , 0 . 
414214, 0. 
707107, 0. 
0.707107, 
,414214, 0 


3)g 

.ooooooy 

-0.353553/ 

0.353553/ 

,866025}' 

, 000000 }' 

.512472)' 

1.219579/ 

.866025)' 


These matrices and lf z (y) are used to rotate points about the x y-, 

and z-axes through the angles or, y3, and y, respectively. The inverses arc R x (— a), 
Kj,( —£), and R z (—y) and they rotate space about the x-, y-, and z-axe$ through the 
angles —or, —p, and —y, respectively. The next example illustrates the situation, and 
further investigations are left for the reader. 

Example 3,10. A unit cube is situated in the first octant with one vertex at the origin. 
First, rotate the cube through an angle jt/4 about the z-axis; then rotate this image through 
an angle jt /6 about the y-axis. Find the images of all eight vertices of the cube. 

The first rotation is given by the transformation 

fcos(f) -sin(f) 01 IV 

V = If; (— J U = sin(f) cos(f) 0 y 
L 0 0 Ij [_z_ 

"0.707107 -0.707107 0 . 000000 "! V 

=: 0.707107 0.707107 0.000000 y . 

0.000000 0.000000 1.000000 z 


Then the second rotation is given by 


* = RyQv = 


cos(|) 0 sin(f)” 

0 1 0 V 

-sin(|) 0 cos(f)_ 


0.866025 0.000000 0.500000 
= 0.000000 l.000000 0.000000 V 

-0.500000 0.000000 0.866025_ 

The composition of the two rotations is 

[ 0.612372 -0.612372 0.500000' 
W = R y R z (-) U = 0.707107 0.707107 0.000000 

U/ 'U/ -0.353553 0.353553 0.866025 
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(a) (6) (c) 

Figure 3.2 (a) The original starting cube, (b) V = R z (n/A)U . Rotation about 
the z-axis. (c) W = R y {n/6)V. Rotation about the y-axis, 


Numerical computations for the coordinates of the vertices of the starting cube are given in 
Table 3.1 (as positional vectors), and the images of these cubes are shown in Figure 3.2(a) 
through (c). ■ 

MATLAB 

The MATLAB functions det(A) and inv(A) calculate the determinant and inverse 
(if A is invertible), respectively, of a square matrix A. 

Example 3,11, Use MATLAB to solve the linear system in Example 3.6. Use the inverse 
matrix method described in (25). 

First we verify that A is nonsingular by showing that det(A) ^ 0 (Theorem 3.4). 

»A- [0.125 0.200 0.400;0.375 0.500 0.600;0.500 0.300 0.000]; 

»dst (A) 
ans= 

-0.0175 

Following the reasoning in (25), the solution of AX = B is X — A -i B. 
»X«inv(A)*[2.3 4.8 2-9]’ 

X= 

4 ,0000 
3.0000 
3,0000 

We can check our solution by verifying that AX = B. 

»B*A*X 

B= 


2.3000 

4.8000 

2.9000 
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(b) Show that R v (6 )R x (a) = 


cos(^) sin(/3) sin(ar) cos(or) sin(/J) 
0 cos (a) - sin(ot) 

— sin(ar) cos(jS) sin(a) cos(£)cos(a) 


8 . If A and B are nonsingular N x N matrices and C = AB, show that C _l = B~ ] A' 
Hint. Use the associative property of matrix multiplication. 

9. Prove statements (13) and (16) of Theorem 3,3. 


I/! I at A an M v M , 


r\v an A V an SI 


(a) How many multiplications are needed to calculate AX ? 

(b) How many additions are needed to calculate A X? 

11. Let A be an M x N matrix, and let B and C be N x P matrices. Prove the left 
distributive law for matrix multiplication: A(B + C) ~ AB 4- AC. 

12. Let A and B be M x N matrices, and let C be a N x P matrix. Prove the right 
distributive law for matrix multiplication: (A + B)C = AC + BC . 

13. bind XX and X X, where X = [l — 1 2J. Note. X' is the transpose of X , 

14. Let A be a M x N matrix and B a N x P matrix. Prove that (AB)' = B'A 1 . Hint. Let 
C = AB and show, using the definition of matrix multiplication, that the (i, _/>th entry 
of C’ equals the (/, y )th entry of B f A r . 

15. Use the result of Exercise 14 and the associative property of matrix multiplication to 
show that (ABC)' = C'B’A f . 


Algorithms and Programs 


The first column of Table 3.1 contains the coordinates of the vertices of a unit cube situated 
in the first octant with one vertex at the origin. Note that all eight vertices can be stored in 
a matrix V of dimension 8x3, where each row represents the coordinates of one of the 
vertices. It follows from Exercise 14 that the product of V and the transpose of R z (tz /4) 
will produce a matrix of dimension 8x3 (representing the second column of Table 3.1, 
where eech row represents the transformation of the corresponding row in V). Combining 
this idea with Exercise 15, it follows that the coordinates of the vertices of a cube under 
any number of successive rotations can be represented by a matrix product. 

1. A unit cube is situated in the first octant with one vertex at the origin. First, rotate 
the cube through an angle of , 7/6 about the y-axis; then rotate this image through an 
angle of ?r/4 about the z-axis. Find the images of all eight vertices of the starting 
cube. Compare this result with the result in Example 3.10 
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(a) (c) 

Figure 3,3 (a) The original starting cube, fb) V = R y in/t)V. Rotation about 
the y-axis. fc) W =. R,(tt/4)V. Rotation about the g-axis. 


What is different? Explain your answer using the fact that, in general, matrix mul¬ 
tiplication is not commutative. (See Figure 3.3(a) to (c». Use the plot3 command to 
plot each of the three cubes. 

2. A unit cube is situated in the first octant with one vertex at the origin. First, rotate 
the cube through an angle of jt/12 about the *-axis; then rotate this image through 
an angle of jt/6 about the r-axis. Find the images of all eight vertices of the starting 
cube. Use the plot3 command to plot each of the three cubes, 

3. The tetrahedron with vertices at (0,0, 0), (1 T 0, 0), (0, 1, 0), and (0, 0, 1) is first ro¬ 
tated through an angle of 0.15 radian about the y-axis, then through an angle of 
-1.5 radians about the ;-axis, and finally through an angle of 2.1 radians about the 
jr-axis. Find the images of all four vertices. Use the plot3 command to plot each of 
the four images. 


33 Upper-triangular Linear Systems 

We will now develop the back~sub$titution algorithm, which is useful for solving a lin¬ 
ear system of equations that has an upper-triangular coefficient matrix. This algorithm 
will be incorporated in the algorithm for solving a general linear system in Section 3.4 

Definition 3,2. An N x TV matrix A — f ajj] is called upper triangular provided that 
the elements satisfy a,-j — 0 whenever! > j. The N x N matrix A — [aij] is called 
lower triangular provided that cijj =0 whenever! < j. 

We will develop a method for constructing the solution to upper-triangular linear 
systems of equations and leave the investigation of lower-triangular systems to the 
reader, [f A is an upper-triangular matrix, then AX = B is said to be an upper- 
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triangular system of linear equations and has the form 

011*1+012*2+013*3 H-1- flijV-i*N-l + a\tfXN—b\ 

022 * 2 + 023*3 H-F a 2 N -\ XN ~\ + « 2 W*jV = h . 

033*3 -1-+ 03N~1*N-1 + 03iV*JV = i?3 

( 1 ) 


QN-iN- i*iv^t + a#-iN*w — btf-\ 

0 jv n*n — i>N- 

Theorem 3.5 (Back Substitution), Suppose that AX = B is an upper-triangular 
system with the form given in (1), IF 

(2) 0**^0 for k = I, 2, ..., Ny 

then there exists a unique solution to (1). 


so we solve it first: 


Now xn is known and it can be used in the next-to-Jast equation: 

,A\ _ bN-\-aN-\NXN 


*JV-I — 


0 AT-_IAf_J 


Now xn and xn-\ are used to find xs-z’ 

„ b N~2 -ati~2N-\XN-\ -0JV-2JW 

w/ *N-l ~ -, 

0V-2JV-2 

Once the values x^, xn-\, . - ■ f x*+i are known, the general step is 


~ °kj x j 


for k = N - 1, N - 2, 1. 


The uniqueness of the solution is easy to see. The Nth equation implies that 
bs/^NN is the only possible value of x#. Then finite induction is used to establish 
that xm-i , *y- 2 ,..., *1 are unique. • 


Exam ole 3.12. Use back substitution to solve the linear system 

4-xi — x 2 + 2 x 3 + 3*4 = 20 

— 2 x 2 + 7*3 — 4^4 — —7 
6 x 3 4 - 5x4 — 4 

3 x 4 =s 6. 




(9) 
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Using the last equation in (9), we must have .v 4 = 2, which is substituted into the second 
and third equations to get = —1, which checks out in both equations. But only two 
values * 3 and x 4 have been obtained from the second through fourth equations, and when 
they are substituted into the first equation of (9), the result is 

00 ) xj = 4x] - 16, 

which has infinitely many solutions; hence (9) has infinitely many solutions. If we choose a 
value of x\ in (10), then the value of X 2 is uniquely determined. For example, if we include 
the equation x\ = 2 in the system (9), then from (10) we compute x 2 = -8. ■ 

Theorem 3,4 states that the linear system AX = B, where A is an N x N matrix, 
has a unique solution if and only if det(A) ^ 0. The following theorem states that 
if any entry on the main diagonal of an upper- or lower-triangular matrix is zero then 
det{.A) = 0. Thus, by inspecting the coefficient matrices in the previous three exam¬ 
ples, it is clear that the system in Example 3,12 has a unique solution, and the systems 
in Examples 3.13 and 3.14 do not have unique solutions. The proof of Theorem 3.6 
can be found in most introductory linear algebra textbooks. 


If the N x N matrix A — [a t j] is 


(11) det(A) = < 211^22 - ■ -Qnn = 

r=l 

The value of the determinant for the coefficient matrix in Example 3.12 is det A — 
4(-2)(6)(3) — -144. The values of the determinants of the coefficient matrices in 
Example 3.13 and 3.14 are both 4(0)(6){3) = 0. 

The following program will solve the upper-triangular system (1) by the method 
of back substitution, provided a ** ^ 0 for k = 1, 2, ..., N. 


Program 3.1 (Back Substitution), To solve the upper-triangular system AX = B 
by the method of back substitution. Proceed with the method only if all the diagonal 
elements are nonzero. First compute — bn{a^N and then use the rule 


^ 1 a kj x j 


for k = N - 1, N - 2, ..., 1. 


funct^yi X=backsub(A,B) 

Xlnput - A is an n x n upper-triangular nonsingular matrix 
% - B is an n i 1 matrix 

7,Output - X is the solution to the linear system AX « B 

%Find the dimension of B and initialize X 
n® length (B); 

X*zeros(n,1); 
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JC(n)=B(n)/A(n,n) ; 
for k=n-l:-1:1 

X(k)“CB(k)~ACk,k+l:n)*XCk+l:n))/A(k 1 k), 

end 


Exercises for Upper-triangular Linear Systems _ 

[n Exercises ] through 3, solve the upper-triangular system and find the value of the dete: 
min ant of the coefficient matrix. 

1. 3xi - 2 x 2 + *3 - *4 = 8 2 . 5 X 1 - 3 X 2 - 7*3 + xa = ~ 

4X2 - *3 + 2X4 5= -3 Mx 2 + 9X3 + 5X4 = 22 

2X3 + 3X4 = 11 3jc 3 - 13.C4 = -11 

5x 4 — 15 7X4= 14 

3. <tx } - X2+2X3 + 2X4 - X5 = 4 

-2x2 + 6x3 + 2x4 + 7x5 — 0 
X 3 - X4 ^ 2x 5 — 3 
- 2x4 - X5 = 10 
3X5 = 6 

4. (a) Consider the two upper-triangular matrices 


flu a\2 «13 

A = 0 ^22 a 21 

0 0 U33 


b\\ ^12 ^13 

and B — 0 £>22 623 • 

0 0 fe 3 3 


Show that their product C = AB is also upper triangular. 

(b) Let A and B be two N x h' upper-triangular matrices. Show thai their produ 
i$ also upper triangular. 

5. Solve the lower-triangular system AX = B and find deu'A). 

2 x] =6 

—XI 4- 4X2 = 5 

3xi 2x2 - X 3 =4 

xj - 2 x 2 + 6 x 3 4- 3 x 4 = 2 

6 . Solve the lower-triangular system AX = B and find det(A). 


3 xi + 4x2 + 2xj = 2 
-xi + 3x2 — 6x3 - X4 = 5 



3*4 Gaussian Elimination and Pivoting 

In this section we develop a scheme for solving a general system AX = B of N 
equations and N unknowns. The goal is to construct an equivalent upper-triangular 
system UX — ¥ that can be solved by the method of Section 3.3. 

Two linear systems of dimension N x N are said to be equivalent provided that 
their solution sets are the same. Theorems from linear algebra show that when certain 
transformations are applied to a given system the solution sets do not change. 
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Theorem 3.7 (Elementary Transformations). The following operations applied to 
a linear system yield an equivalent system: 

(1) Interchanges: The order of two equations can be changed. 

(2) Scaling: Multiplying an equation by a nonzero constant. 

(3) Replacement: An equation can be replaced by the sum of itself and 

a nonzero multiple of any other equation. 

It is common to use (3) by replacing an equation with the difference of that equa¬ 
tion and a multiple of another equation. These concepts are illustrated in the next 
example. 

Example 3.15. Find the parabola y = A + Bx + Cx 2 that passes through the three points 
(L 1), (2.-1), and (3, 1). 

For each point we obtain an equation relating the value of x to the value of y. The 
result is the linear system 

A+ B+ 1 at (1, I) 

(4) A + 2B +4C = — I at (2, -1) 

A + 3B + 9C = 1 at (3,1). 

The variable A is eliminated from the second and third equations by subtracting the 
first equation from them. This is an application of the replacement transformation (3), and 
the resulting equivalent linear system is 

A -j- B + C — 1 

(5) B + 3C - - 2 

2B + 8C = 0. 

The variable B is eliminated from the third equation in (5) by subtracting from it t\\ o times 
the second equation. We arrive at the equivalent upper-triangular system: 

A + B+ C = 1 

( 6 ) B + 3C = —2 

2C= 4. 

The back-substitution algorithm is now used to find the coefficients C — 4/2 = 2, B — 
-2 - 3(2) = -8, and A - 1 - (-8) - 2 = 7, and the equation of the parabola ■- 
y = 7 — 8* 4- 2jc 2 . * 

It is efficient to store all the coefficients of the linear system AX = B in an arr;-. 
of dimension N x (N + 1)- The coefficients of B are stored in column N + 1 of tlu 
array (i.e., a*N+i = £>*). Each row contains all the coefficients necessary to represent 
an equation in the linear system. The augmented matrix is denoted [A\B] and the 
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linear system is represented as follows: 




" an 

an ■ 

• auv 

b\ ' 

(7) 

[A\B} = 

«21 

fl 2 2 ' 

■ aiN 

bi 



_ tf/vi 

&N2 ' 

' &NN 

t>N 


The system AX = B, with augmented matrix given in (7), can be solved by per¬ 
forming row operations on the augmented matrix The variables x k are place¬ 

holders for the coefficients and can be omitted until the end of the calculation. 

Theorem 3,8 (Elementary Row Operations). The following operations applied to 
the augmented matrix (7) yield an equivalent linear system. 

(S) Interchanges: The order of two rows can be changed. 

(9) Scaling: Multiplying a row by a nonzero constant. 

(10) Replacement: The row can be replaced by the sum of that row and 

a nonzero multiple of any other row; that is: 
row, = row, — m rp x row p . 

It is common to use (10) by replacing a row with the difference of that row and a 
multiple of another row. 

Definition 3.3 (Pivot). The number a rr in the coefficient matrix A that is used to 
eliminate a kr , where k — r + 1, r -r 2, ..,, A r , is called the rth pivotal element, and 
the rth row is called the pivot row. A 

The following example illustrates how to use the operations in Theorem 3.8 to 
obtain an equivalent upper-triangular system VX - Y from a linear system AX — B 
where A is an N x N matrix. 

Example 3.16. Express the following system in augmented matrix form and find an 
equivalent upper-triangular system and the solution, 

* 1 + 2 x 2 + .*3-1-4x4 = 13 
2*i + 0x2 + 4x 3 + 3x 4 = 28 
4xi + 2x 2 + 2 x 3 + X 4 = 20 
-3xi H- X 2 + 3 x 3 + 2 x 4 = 6 . 

The augmented matrix is 


pivot -+■ 

1 

2 

1 

4 

13 " 

m 21 - 2 

2 

0 

4 

3 

28 

m 3I - 4 

4 

2 

2 

1 

20 

»*4 1 = -3 

-3 

1 

3 

2 

6 
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The first row is used to eliminate elements in the first column beiow the diagonal. 
We refer to the first row as the pivotal row and the element an = 1 is called the pivotal 
element. The values are the multiples of row 1 that are to be subtracted from row k for 
k — 2,3,4. The result after elimination is 

"12 1 4 13 " 

pivot —► 0 ^4 2 —5 2 

m 32 = 1.5 0 -6 -2 -15 -32 ‘ 

m 42 = -1.75 |_ 0 7 6 14 45 _ 

The second row is used to eliminate elements in the second column that lie below the 
diagonal. The second row is the pivotal row and the values m* 2 are the multiples of row 2 
that are to be subtracted from row k for k = 3,4. The result after elimination is 

“1 2 1 

0-4 2 

pivot —► 0 0 —5 

m43 = —1.9 0 0 9.5 

Finally, the multiple m 43 = —1.9 of the third row is subtracted from the fourth row. and 
the result is the upper-triangular system 

12 1 4 13 " 

0-4 2-5 2 

0 0 -5 -7.5 -35 ' 

0 0 0 -9 —18 

The back-substitution algorithm can be used to solve (11), and we get 

*4 = 2, *3=4, X2-— 1, *1=3. ■ 

The process described above is called Gaussian elimination and must be modified 
so that it can be used in most circumstances. If att = 0> row k cannot be used to 
eliminate the elements in column k , and row k must be interchanged with some row 
below the diagonal to obtain a nonzero pivot element. If this cannot be done, then the 
coefficient matrix of the system of linear equations is nonsingular, and the system does 
not have a unique solution. 

Theorem 3.9 (Gaussian Elimination with Back Substitution). If A is an N x N 

nonsingular matrix, then there exists a system UX = Y, equivalent to AX = B , where 
U is an upper-triangular matrix with ukk 0- After V and Y are constructed, back 
substitution can be used to solve UX = Y for X. 



4 13 

-5 2 

-7.5 -35 
5.25 48.5 
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The new elements are written Orf to indicate that this is the second time that a 
number has been stored in the matrix at location (r, c ). The result after step 2 is 


—I 

"12 

-S’ • 

■ 4 

4+, 1 

0 

a n 

4 ' 

• 4 

4+, 

0 

4 

4 • 

■ 4 

a® 

“3 Af+1 

1- 

o •• 

4 

4 • 

■ a % 

a {2) 

a NN+l 


Step 3. If necessary, switch the second row with some row below it so that 
a 22 ^ 0; then eliminate X 2 in rows 3 through N. In this process, m r 2 is the multi¬ 
ple of row 2 that is subtracted from row r. 


for r = 3 : N 

m r2 = af^ia§\ 


a <3) 

a r2 


= 0 ; 


fore = 3 : N + \ 


end 

end 


tire — a rc ~ Mr2 * a 2 c 1 


The new elements are written to indicate that this is the third time that a num¬ 
ber has been stored in the matrix at location (r, c). The result after step 3 is 


' 4 

4 

4 ■ 

■■ 4 

„0> 1 
-IJV+l 

0 

4 

4 ■ 

■■ 4 

a O) 

"2JV+J 

0 

0 

4 ■ 

■ 4 

a <3> 

a 3 JV-H 

0 

0 

4 ■ 

■ 4 

"tf/V+t 


Step p 4- 1. This is the general step. If necessary, switch row p with some row 
beneath it so that ajj? ^ 0; then eliminate x p in rows p + 1 through N. Here m rp is 
(he multiple of row p that is subtracted from row r. 

for r = p + 1 : N 
m rp = af p ja'p-, 
a% w = 0 ; 


for c = p + 1 : JV + 1 


_(p+1) _ (p) 

a rc — tire ~ m 


end 

end 


rp 


*a 


(p). 

pc 1 


The final result after xjv-i has been eliminated from row N is 


' ‘'ll 1 

4 

4 • 

■ 4 

4+, - 

0 

4 

4 • 

• 4 

4+i 

0 

0 

a 33 

• 4 

4+. 

0 

0 

0 .. 

■ 4 

4+, _ 


The upper-iriangularization process is now complete. 

Since A is nonsingular, when row operations are performed the successive matrices 
are also nonsingular. This guarantees that a$ £ 0 for all k in the construction process. 
Hence back substitution can be used to solve UX = Y for X, and the theorem is prove. 


Pivoting to Avoid — 0 

If Qpp = 0, row p cannot be used to eliminate the elements in column p below the 
main diagonal. It is necessary to find row k, where 0 and k > p t and then in¬ 

terchange row p and row k so that a nonzero pivot element is obtained. This process is 
called pivoting, and the criterion for deciding which row to choose is called a pivoting 
strategy. The trivial pivoting strategy is as follows. If a f p ^ 0, do not switch rows. 
If a [ p p p = 0, locate the first row below p in which # 0 and switch rows k and p. 
This will result in a new element Opp ^ 0, which is a nonzero pivot element. 


Pivoting to Reduce Error 

Because the computer uses fixed-precision arithmetic, it is possible that a small error 
will be introduced each time that an arithmetic operation is performed. The following 
example illustrates how the use of the trivial pivoting strategy in Gaussian elimination 
can lead to significant error in the solution of a linear system of equations. 
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Example 3.17. The values *i = xj — 1.000 are the solutions to 

U33xi+5.28U 2 = 6.414 
(12) 24.14jki — 1.210*2 = 22.93. 

Use four-digit arithmetic (see Exercises 6 and 7 in Section 1.3) and Gaussian elimination 
with trivial pivoting to find a computed approximate solution to the system. 

The multiple mu — 24,14/1.133 = 21.31 of row 1 is to be subtracted from row 2 to 
obtain the upper-triangular system. Using four digits in the calculations, we obtain the new 
coefficients 

flg } = -1.210-21.31(5.281) = -1.210- 112,5 = -113.7 
= 22.93 - 21.31(6.414) = 22.93 - 136.7 = -113.8. 

The computed upper-triangular system is 

1.133*i + 5.281*2 = 6.414 
—113.7*2 = —113.8. 

Back substitution is used to compute *2 = -113.8/(-113.7) = 1.001, and *i = (6,414 — 
5.281 < 1.001 ))/< 1.133) = (6.414- 5.286)/(l.133) = 0.9956. m 

The error in the solution of the linear system (12) is due to the magnitude of the 
multiplier m 2 1 =21,31. In the next example the magnitude of the multiplier mji is 
reduced by first interchanging the first and second equations in the linear system (12) 
and then using the trivial pivoting strategy in Gaussian elimination to solve the system. 

Example 3.18. Use four-digit arithmetic and Gaussian elimination with trivial pivoting 
to solve the linear system 

24.14*j - 1,210*2 = 22.93 
1.133*1+5.281*2 = 6.414. 

This time »i 2 i = 1,133/24J 4 = 0.04693 is the multiple of row 1 that is to be subtracted 
from row 2. The new coefficients are 

ag' = 5.281 - 0.04693(-1.210) = 5.281+0.05679 = 5.338 
= 6.414 - 0.04693(22.93) = 6.414 - 1.076 = 5.338. 

The computed upper-triangular system is 

24.14*i - 1.210*2 = 22.93 
5,338* 2 = 5.338. 

Back substitution is used to compute *2 = 5,338/5.338 = 1.000, and *1 = (22,93 + 
1.210(1.000))/(24.14) = 1.000. ■ 
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The purpose of a pivoting strategy is to move the entry of greatest magnitude t< 
the main diagonal and then use it to eliminate the remaining entries in the column. I: 
there is more than one nonzero element in column p that lies on or below the mair 
diagonal, then there is a choice to determine which rows to interchange. The partkii 
pivoting strategy, illustrated in Example 3.18, is the most common one and is used ir 
Program 3.2. To reduce the propagation of error, it is suggested that one check the 
magnitude of all the elements in column p that lie on or below the main diagonal 
Locate row k in which the element that has the largest absolute value lies, that is, 

= maxlja^j, ■ - - > i“/v—ip!» 

and then switch row p with row k if k > p . Now, each of the multipliers m rp foi 
r = p + 1, . - -, N will be less than or equal to 1 in absolute value. This process will 
usually keep the relative magnitudes of the elements of the matrix V in Theorem 3.9 
the same as those in the original coefficient matrix A. Usually, the choice of the larger 
pivot element in partial pivoting will result in a smaller error being propagated. 

In Section 3.5 we will find that it takes a total of (4N 3 + 91V 2 - 7 N )/6 arithmetic 
operations to solve an N x N system. When N = 20, the total number of arithmetic 
operations that must be performed is 5910, and the propagation of error in the compu¬ 
tations could result in an erroneous answer. The technique of scaled partial pivoting 
or equilibrating can be used to further reduce the effect of error propagation. In scaled 
partial pivoting we search all the elements in column p that he on or below the main 
diagonal for the one that is largest relative to the entries in its row. First search rows p 
through N for the largest element in magnitude in each row, say s r : 

(13) s r = max{|« rp |, \a rp +i\, ..Kvll for r ~ p, p + 1.AT. 


the pivotal row k is determined by finding 



Now interchange row p and k, unless p = k. Again, this pivoting process is designed 
to keep the relative magnitudes of the elements in the matrix U in Theorem 3.9 the 
same as those in the original coefficient matrix A. 

Si conditioning 

A matrix A is called ill conditioned if there exists a matrix B for which small pertur¬ 
bations in the coefficients of A or B will produce large changes in X = A~ ] B. The 
system AX = B is said to be ill conditioned when A is ill conditioned. In this case, 
numerical methods for computing an approximate solution are prone to have more 
error. 

One circumstance involving ill conditioning occurs when A is “nearly singular” 
and the determinant of A is close to zero. Ill conditioning can also occur in systems 





ot two equations when two lines aie nearly parallel (or in three equations when three 
planes are nearly parallel). A consequence of ill conditioning is that substitution of 
erroneous values may appear to be genuine solutions. For example, consider the two 
equations 

x -f- 2 y — 2.00 = 0 
{15) 2jc + 3y — 3,40 — 0. 

Substitution of Jto = 1.00 and yo = 0.48 into these equations “almost produces zeros ’: 

1 + 2(0.48) - 2.00 = 1.96 - 2.00 = -0.04 » 0 

2 + 3(0.48) - 3.40 = 3.44 - 3.40 = 0.04 as 0. 

Here the discrepancy from 0 is only ±0.04. However, the true solution to this lin 
ear system is x = 0.8 and y = 0 . 6 , so the errors in the approximate solution arc 
x - *o = 0.80 — 1.00 — -0,20 and y - yo = 0.60 - 0.48 = 0.12. Thus, merely sub 
stituting values into a set of equations is not a reliable test for accuracy. The rhombus¬ 
shaped region R in Figure 3.4 represents a set where both equations in (15) are “almost 
satisfied”: 

R = {(x,y) : jjc + 2y - 2.00| <= 0,1 and \2x + 3y - 3.401 < 0.2}. 

There are points in R that are far away from the solution point (0.8, 0.6) and yet 
produce small values when substituted into the equations in (15). If it is suspected 
that a lineri system is ill conditioned, computations should be carried out in multiple 
precision .:t i Jimetic. The interested reader should research the topic of condition num¬ 
ber of a matrix to get more information on this phenomenon. 

Ill conditioning has more drastic consequences when several equations are in 
volved. Consider the problem of finding the cubic polynomial y = ci * 3 + cjx 2 -j- 
C 3 X+C 4 that passes through the four points (2, 8 ), (3, 27), (4,64),and (5, 125) (clearly. 
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y = x* is the desired cubic polynomial). In Chapter 5 we will introduce the method 
of least squares. Applying the method of least squares to find the coefficients requires 
that the following Linear system be solved: 

"20,514 4,424 978 224] [a 1 [20,514" 

4,424 978 224 54 c 2 _ 4,424 

978 224 54 14 c 3 ” 978 ' 

224 54 14 4^ Jc 4 _ ) L 224 

A computer that carried nine digits of precision was used to compute the coefficients 
and obtained 

cj = 1.000004, c 2 = -0.000038, c 3 = 0.000126, and c 4 =-0.000131. 

Although this computation is close to the true solution, c\ = 1 and c% — c 3 = c 4 = 0 , it 
shows how easy it is for error to creep into the solution. Furthermore, suppose that the 
coefficient an = 20,514 in the upper-left comer of the coefficient matrix is changed 
to the value 20,515 and the perturbed system is solved. Values obtained with the same 
computer were 

ci = 0.642857, c 2 - 3.75000, c 3 = -12,3928, and c 4 = 12.7500, 

which is a worthless answer. Ill conditioning is not easy to detect. If the system is 
solved a second time with slightly perturbed coefficients and an answer that differs 
significantly from the first one is discovered, then it is realized that ill conditioning 
is present. Sensitivity analysis is a topic normally introduced in advanced numerical 
analysis texts. 


In Program 3,2 the MATLAB statement [A B] is used to construct the augmented 
matrix for the linear system AX ~ #, and the max command is used to determine 
the pivot element in partial pivoting. Once the equivalent triangulated matrix [U\Y] 
is obtained it is separated into U and Y, and Program 3.1 is used to carry out back 
substitution (backsub(U, Y)). The use of these commands and processes is illustrated 
in the following example. 

Example 3.19. (a) Use MATLAB to construct the augmented matrix for the linear system 
in Example 3.16; (b) use the max command to find the element of greatest magnitude in the 
first column of the coefficient matrix A; and (c) break the augmented matrix in (11) into 
the coefficient matrix 17 and constant matrix Y of the upper-triangular system UX = Y, 

(a) 

» A= [1 2 14;2 0 4 3;4 2 2 l;-3 132]; 

» B=[13 28 20 6 ]’; 

» Aug= [A B] 
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Aug= 

1 2 1 4 13 

2 0 4 3 28 
4 2 2 1 20 

-31326 

(b) In the following MATLAB display, a is the element of greatest magnitude in the liiM 
column of A and j is the row number. 

»[a s j]«max{abs (A (1:4,1) > } 
a= 

4 

J* 

3 

(c) Let Augup = [Z/IL] be the upper-triangular matrix in (II). 

» Augup=[l 214 13;0 -4 2 -5 2;0 0 -5 -7.5 -35;0 00-9 -1ST ; 

» U=Augup(l:4,1:4) 

U= 

1.0000 2.0000 1.0000 4.0000 
0 -4.0000 2.0000 -5.0000 

0 0 -5.0000 -7.5000 

0 0 0 - 9.0000 

» Y=Augup(1:4,5) 

Y“ 

13 

2 

-35 

-IS 


Program 3.2 (Upper Triangularization Followed by Back Substitution). To 
construct the solution to AX = B , by first reducing the augmented matrix [A\B] to 
upper-triangular form and then performing back substitution. 

function X = uptrbk(A,B) 

‘/.Input - A is an N x B nonsingular matrix 

% - B is an W x 1 matrix 

‘/■Output - X is an N x 1 matrix containing the solution to AX=B. 

^Initialize X and the temporary storage matrix C 

EN M=size(A); 

X=zeros(N,l); 

C=zeros(l,N+l); 

‘/.Form the augmented matrix: Aug= [A IB] 

Aug= [A B] ; 
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for p=l:N—1 

*/,Partial pivoting for column p 
[Y,j]=max(abs(Aug(p:N,p))); 

‘/.Interchange row p and j 
C=Aug(p,:); 

AugCp,:)=Aug(j+p-l,:); 

AugCj+p-l,:)=C; 
if Aug(p,p)==0 

’A was singular. No unique solution* 
break 

end 

'/♦Elimination process for column p 
for k=p+l:N 

in=Aug(k,p)/Aug(p.p ); 

Aug(k,p:N+l)=Aug(k,p:N+l)-m*Aug(p,p:N+l); 

end 

end 

73ack Substitution on [U|Y] using Program 3.1 
X=backsub(Aug(l:N,1:N),Aug(l:N,N+l)); 


Exercises for Gaussian Elimination and Pivoting 


In Exercises 1 through 4 show that AX = B is equivalent to the upper-triangular system 
TJX = Y and find the solution. 


1. 

2xi + 4 x 2 ~ 6*3 = 

-4 

2*]+ 4 * 2 - 6*3 = 

-4 


*1 + 5 X 2 + 3*3 = 

10 

11 

'n 

+ 

* 

12 


*1 + 3*2 + 2*3 = 

5 

3*3 = 

3 

2. 

*1 + *2 + 6*3 = 

7 

*1 + *2 + 6*3 = 

7 


-*1 - 1 - 2*2 + 9*3 = 

2 

3*2 + 15*3 — 

9 


*1 — 2*2 + 3*3 = 

10 

12*3 = 

12 

3 . 

2*1 - 2*2 + 5*3 = 

6 

2*1 — 2*2 + 5*3 = 

6 


2*1 + 3*2 + *3 — 

13 

II 

s 

! 

7 


“ *1 + 4*2 - 4*3 = 

3 

0.9*3 = 

1.8 

4 

- 5*1 + 2*2 - *3 — 

-1 

-5*1 + 2*2 - *3 = 

-1 


*1 + 0*2 + 3*3 = 

5 

0.4*2 + 2 . 8*3 = 

4.8 


3*] + *2 + 6*3 = 

17 

- 10*3 = 

-10 


5» Find the parabola y = A + Bx + Cx 2 that passes through (1, 4}, (2, 7), and (3,14). 
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6 . Find the parabola y = A + Bx Cx 2 that passes through (1,6), (2, 5), and (3, 2). 

7. Find the cubic y ^ A + Bx + Cx 2 + Dx 3 that passes through (0,0), (1, 1), (2, 2), 
and (3,2). 

In Exercises 8 through 10, show that AX = B is equivalent to the upper-triangular system 
V X = Y and find the solution. 


8. 4*i + 8*2 + 4*3 + 0*4 = 

8 

4*] + 8*2 + 4*3 + 0*4 = 

8 

Xi + 5*2 + 4*3 - 3*4 = 

-4 

3*2 + 3*3 — 3*4 = 

-6 

*1 + 4*2 + 7*3 + 2*4 = 

10 

4*3 + 4*4 — 

12 

*j + 3*2 + 0*3 — 2*4 = 

-4 

*4 - 

2 

9. 2*1 + 4*2 - 4*3 + 0*4 = 

12 

2*1 + 4*2 - 4*3 + 0*4 = 

12 

*1 + 5*2 — 5*3 — 3*4 = 

18 

3*2 — 3*3 — 3*4 — 

12 

2*1 + 3*2 + *3 + 3*4 = 

8 

4*3 + 2*4 = 

0 

*1 + 4*2 - 2*3 + 2*4 

8 

3*4 = 

-6 

10. *1 + 2*2 + 0*3 - *4 = 

9 

*1 + 2*2 + 0*3 - *4 — 

9 

2*1 +3*2- *3+0*4 = 

9 

—*2 - *3 + 2*4 = 

-9 

0*1 + 4*2 + 2*3 - 5*4 = 

26 

-2*3 + 3*4 = - 

-10 

5*1 + 5*2 + 2*3 - 4*4 = 

32 

1.5*4 = 

-3 


11. Find the solution to the following linear system. 


xi + 2*2 =7 

2*i + 3*2 - *3 =9 

4*2 + 2*3 + 3*4 = 10 
2*3 -4*4= 12 

12. Find the solution to the following linear system. 

*i + *2 =5 

2*i - *2 +5*3 = -9 

3*2 - 4*3 + 2*4 = 19 

2*3 + 6*4 = 2 

13. The Rockmore Corp. is considering the purchase of a new computer and will choose 
either the DoGood 174 or the MightDo 1 1 . They test both computers’ ability to solve 
the linear system 


34* + 55y - 21 = 0 
55* + 89y - 34 = 0. 


is found by substitution: 


34(—0.11) + 55(0.45) -21 = 0.01 
55(—0.11) + 89(0.45) - 34 = 0.00. 

The MightDo 11 computer gives * = -0.99 and y = 1.01, and its check for accuracy 
is found by substitution: 

34(-0.99) +55(1.01)-21 =0.89 
55(-0.99) +89(1.01)- 34 = 1.44. 


Which computer gave the better answer? Why? 

14. Solve the following linear systems using (i) Gaussian elimination with partial pivot¬ 
ing, and (ii) Gaussian elimination with scaled partial pivoting. 

(a) 2*i- 3*2+ 100*3 = 1 (b) x] + 20*2- * 3 +0.001*4 = 0 

*i + 10*2 - 0.001*3 =0 2*i - 5*a + 30*3 - 0.1*4 = 1 

3*i - 100*2 + 0.01*3 = 0 5*i + *2 “ 100*3 - 10*4 = 0 

2*1-100*2- *3+ *4=0 

15. The Hilbert matrix is a classical ill-conditioned matrix and small changes in its coef 
ficients will produce a large change in the solution to the perturbed system. 

(a) Find the exact solution of AX = B (leave all numbers as fractions and do exact 
arithmetic) using the Hilbert matrix of dimension 4x4: 


A = 


t i i r 
1 5 3 I 


1 1 

I J 


till 
.4 5 5 7. 


B = 


(b) Now solve AX = B using four-digit rounding arithmetic: 


‘ 1.0000 

0.5000 

0.3333 

0.2500" 


V 

0.5000 

0.3333 

0.2500 

0.2000 

H — 

0 

0.3333 

0.2500 

0.2000 

0.1667 

* " — 

0 

0.2500 

0.2000 

0.1667 

0.1429 


0 


Note. The coefficient matrix in part (b) is an approximation to the coefficient 
matrix in part (a). 


The DoGood 174 computer gives* = -0.11 andy = 0.45, and its check for accuracy 
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Algorithms and Programs _ 

1. Many applications involve matrices with many zeros. Of practical importance are 
tridiagonal systems (see Exercises 11 and 12) of the form 

d\x\+c\x 2 =b[ 

a\x\ +d 2 X2+C2X3 =&2 

02X2 + d3*3 + C 3 X 4 = by 


aN-2*N-2 +df/~ IXjv^I +CN-iXff 

«JV-]-*JV-l + dffXN — bN ■ 

Construct a program that will solve a tridiagonal system. You may assume that row 
interchanges are not needed and that row k can be used to eliminate x k in row k + 1 . 

2. Use Program 3,2 to find the sixth-degree polynomial y = a\ + aix + 03 J : 2 + a 4 x 3 + 
a 5 x 4 + a 6 x 5 + £ 17 X 6 that passes through ( 0 , 1 ), ( 1 , 3 ) r ( 2 , 2 ), ( 3 , 1 ), ( 4 , 3 ), ( 5 , 2 ), 
and ( 6 , 1). Use the plot command to plot the polynomial and the given points on the 
same graph. Explain any discrepancies in your graph. 

3. Use Program 3.2 to solve the linear system AX = B , where A = [a y]/vx,/v and 

aij = i J ~\ and B — [*,y]jvsei, where bu — N and bn = i N ~ 2 /{i - 1) for i > 2. 
Use N = 3, 7, and 11. The exact solution is X = [l I ... 1 l]'. Explain any 

deviations from the exact solution. 

4. Construct a program that changes the pivoting strategy in Program 3.2 to scaled partial 
pivoting. 

5. Use your scaled partial pivoting program from Problem 4 to solve the system given 
in Problem 3 for N = 11. Explain any improvements in the solutions. 

6 . Modify Program 3.2 so that it will efficiently solve M linear systems with the same 
coefficient matrix A but different column matrices B . The M linear systems look like 

AX\ = B], AX 2 = B 2 , AX m =B m . 

7. The following discussion is presented for matrices of dimension 3x3, but the con¬ 
cepts apply to matrices of dimension AT x N. If A is nonsingular, then A“ J exists and 
AA~ ] = I. Let Ci, C 2 , and C 3 be the columns of A -1 and E\, E 2 , and £3 be the 
columns of I. The equation A A -1 = I can be represented as 

AfCi C 2 C 3 ] = [£ l Ei £ 3 ]. 

This matrix product is equivalent to the three linear systems 


ACT = El, 


ACi — £ 2 , and AC 3 = £ 3 . 
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Thus finding A -1 is equivalent to solving the three linear systems. 

Using Program 3.2 or your program from Problem 6 , find the inverse of each of the 
.following matrices. Check your answer by computing the product A A s and also by 
using the command inv UO. Explain any differences. 

16 -120 240 -140 

-120 1200 -2700 1680 

240 -2700 6480 -4200 

-140 1680 -4200 2800_ 


3*5 Triangular Factorization 

In Section 3.3 we saw how easy it is to solve an upper-triangular system. Now we 
introduce the concept of factorization of a given matrix A into the product of a lower- 
triangular matrix L that has l’s along the main diagonal and an upper-triangular ma¬ 
trix U with nonzero diagonal elements. For ease of notation we illustrate the concepts 
with matrices of dimension 4x4, but they apply to an arbitrary system of dimension 
N x N, 

Definition 3.4* The nonsingular matrix A has a triangular factorization if it can 
be expressed as the product of a lower-triangular matrix L and an upper-triangular 
matrix U : 

(!) A = LV. 

In matrix form, this is written as 

a \] a\i «i3 * 

ai\ an an 024 _ m 2i 

031 «32 O33 034 m 31 

C4l 042 «43 044 J [_ m41 

The condition that A is nonsingular implies that u kk i- 0 for all k. The notation 
for the entries in L is my, and the reason for the choice of my instead of /y will be 
pointed out soon. 

Solution of a Linear System 

Suppose that the coefficient matrix A for the linear system AX = B has a triangular 
factorization ( 1 ); then the solution to 




( 2 ) 


LUX = B 





(7) 


iflj 
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Now use back substitution and compute the solution *4 = —24/(—6) = 4, *3 = (6 — 
3(4))/(-2) = 3,x 2 = (10 - 2(4) + 2(3))/4 = 2, and = 21 - 4 - 4(3) - 2(2} = 1 , or 
X=[\ 2 3 4]'. • 

Triangular Factorization 

We now discuss how to obtain the triangular factorization. If row interchanges are not 
necessary when using Gaussian elimination, the multipliers are the subthagonal 
entries in L. 


Example 3.21. Use Gaussian elimination to construct the triangular factorization of the 
matrix 



The matrix L will be constructed from an identity matrix placed at the left. For each row 
operation used to construct the upper-triangular matrix, the multipliers mij will be put in 
their proper places at the left. Start with 


1 

As 0 
0 


0 0 
1 0 
0 1 


4 3-1 

-2 -4 5 

1 2 6 


Row 1 is used to eliminate the elements of A in column 1 below an. The multiples m 2 1 = 
—0.5 and mj] — 0-25 of row 1 are subtracted from rows 2 and 3, respectively. These 
multipliers are put in the matrix at the left and the result is 

1 0 0"| [4 3 -f 

A= -0.5 1 0 0 -2.5 4.5 . 

0.25 0 lj |_0 1.25 6.25_ 

Row 2 is used to eliminate the elements of A in column 2 below a 22 - The multiple /T 132 — 
—0.5 of the second row is subtracted from row 3, and the multiplier is entered in die matrix 
at the left and we have the desired triangular factorization of A. 

1 0 Ol [4 3 -l" 

(8) A = -0.5 1 0 0 -2.5 4.5 . 

0.25 -0.5 lj [_0 0 8.5j m 


Theorem 3.10 (Direct Factorization A = LU. No Row Interchanges). Suppose 
that Gaussian elimination, without row interchanges, can be successfully performed to 
solve the general linear system AX — B. Then the matrix A can be factored as the 
product of a lower-triangular matrix L and an upper-triangular matrix 13 : 


A = LU. 
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Furthermore, L can be constructed to have 1 ’s on its diagonal and U will have nonzero 
diagonal elements. After finding L and U , the solution X is computed in two steps: 

1. Solve LU = B for Y using forward substitution. 

2. Solve UX = Y for X using back substitution. 


fair —2: N 


m r \ = a^/a 


(U. 

ll • 


Or 1 

for c = 2 : N -+ 1 


Proof. We will show that, when the Gaussian elimination process is followed and 
B is stored in column N + 1 of the augmented matrix, the result after the upper- 
triangularization step is the equivalent upper-triangular system UX = Y. The matrices 
L, U, B, and Y will have the form 


1 

0 

0 

0" 


r «<»> 1 

"l N+! 

mi\ 

1 

0 

0 


°2N+1 

m3] 

m 32 

1 

... 0 

B = 

a {l) 

°3W+1 

VI 

m-ivi m ; v3 

- I 


a (N) 

u nn+i 

43 

43 

43 - 

4!T 


rn (i) 1 

a lN+l 

0 

4? 

4? - 

42 


a (2) 

"2 N+l 

0 

0 

a <3) ... 

U33 

4? 

, Y = 

a 0] 

a 3N+l 

0 

0 

0 

4» 


a (N) 
a N N+ 1 


Remark. To find just L and f7, the (N + 1) st column is not needed. 

Step /, Store the coefficients in the augmented matrix. The superscript on all 1 
means that this is the first time that a number is stored in location {r, c). 


a 

end 

end 


( 2 ) 

rc 


— — rfifi ♦ o. 


(i). 
Ic > 


The new dements are written to indicate that this is the second time that a 
number has been stored in the matrix at location (r, c). The result after step 2 is 


' -If 

43 

a il) ■ 
a l3 

■ 43 

a W 1 

a lN+l 

mn 

4? 

a m ■ 
“23 

• -S 

a (2) 

“2 N+l 

m 3] 

& 

tO NJ 

n (2) • 

"33 

J2> 

* a 3N 

a 3 N+l 

m N \ 

a {2) 

a N2 

a (2) ■ 

“N3 

J 2 > 

' a NN 

a m 
a N N+l 


Step 3. Eliminate x 2 in rows 3 through N and store the multiplier m r2l used to 
eliminate xi in row r, in the matrix at location (r, 2). 


for r = 3 : N 
m t2 = ^rll a 22' 


a r 2 = rtirl\ 


for c = 3 : N + 1 
(3) (2) 

arc — a)c - Mr2 * 

end 

end 


a 


( 2 ). 
2c * 


The new elements are written 0 $ to indicate that this is the third time that a num¬ 
ber has been stored in the matrix at the location (r, c). 


’43 

43 

43 ■ 


4'J + , 1 

43 

43 

-S' ■ 

‘ a 2N 

4'L, 

4? 

43 

43 • 

* 4 n 

a (l) 

a 3N+l 


4S 

-is • 

■ -Si 

_ (I) 

a N N+l 


‘ 43 

o> 

a l2 

43 • 

■ 43 

a <l) 1 

a \ N+l 

m 2 \ 

J2) 

a 22 

43 • 

■ 43 

.(2) 

a 2N+\ 

"131 

"132 

43 • 

■ 43 

a i3) 

“3 N+l 

m N \ 

m A'2 

■ 

• *k 

fl <3> 
a NN+ 1 


Step 2. Eliminate *i in rows 2 through N and store the multiplier m r 1 , used to 
eliminate x\ in row r, in the matrix at location (r, 1). 


Step p + 1 . This is the general step. Eliminate x p in rows p + 1 through N and 
store the multipliers at the location ( r , p). 
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for r — p + 1 : N 

171 rp = 4*p j4pp\ 
a r p — ftt r p\ 

fox c = p +1 : N + 1 

Ap+» __(/>)_ m ± a (P). 

O-rr — &rc m rp * a pc i 

end 

end 


The final result after jcn - \ has been eliminated form row N is 


°u 

«!? 

-1? • 

-ill 

a uv+i 

mn 


• 

■■ «8 

fl (2) 

a 2 N+l 

mu 

myi 

4’ ■ 


a {2) 

“3 N+l 


ffl/VI HIN 2 m N 3 


tN) 

a NN a NN+\ J 


The upper'triangular process is now complete. Notice that one array is used to store 
the elements of both L and U. The l’s of L are not stored, nor are the 0's of L and 
U that lie above and below the diagonal, respectively. Only the essential coefficients 
needed to reconstruct L and U are stored! 

We must now verify that the product LU — A. Suppose that D = LU and 
consider the case when r < c. Then d rc is 

( 9 ) d rc = m r iaj” 4 - m n a£ + ■ ■ - + m rr ^a^ + 4 c- 

Using the replacement equations in steps 1 through p +1 = r, we obtain the following 
substitutions: 

m r ia^ = a® - a®, 


When the substitutions in (10) are used in (9), the result is 


drc = - 4 ? + 4? - 4c + • ■' + 


„(r) . „(r) = „(l) 


The other case, r > c, is similar to prove. 
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Computational Complexity 

The process for triangularizing is the same for both the Gaussian elimination and tri¬ 
angular factorization methods. We can count the operations if we look at the first N 
columns of the augmented matrix in Theorem 3.10. The outer loop of step p + 1 re¬ 
quires N — p = A/ - (p -t-1) + l divisions to compute the multipliers m rp . Inside the 
loops, but for the first N columns only, a total of (N — p)(N — p) multiplications and 
the same number of subtractions are required to compute the new row elements 4c +l) ■ 

This process is carried out for p = 1, 2 . N — 1. Thus the triangular factorization 

portion of A = LU requires 


£(JV-p)(tf-p+l) = 


JV 3 - N 


multiplications and divisions. 


N-l 

Ylw-pXN-p) = 

P=1 


2iV 3 -3 N 2 + N 


subtractions. 


lb establish (11), we use the summation formulas 


E* = 


M(itf + 1) 
2 


and = 


M(M + l)(2Af + 1) 
6 


Using the change of variables k — N — p, we rewrite (11) as 

N-l N-l N-l 

J2<. N -pH N -p+ l )=Y. < - N -p)+12 < - lt -p'> 1 

P= 1 p~ l p— i 

AT-1 JV—1 

= £* + £** 

*=i 

(JV — \)N (AT - 1)(JV)(2JV - 1) 

= + - 6 - 

N 3 -N 


Orikfc/the triangular factorization A = L U has been obtained, the solution to the 
lower-triangular system LY = B will require 0 + 1 + • - - + JV — 1 = (N 2 — N)/2 
multiplications and subtractions; no divisions are required because the diagonal ele¬ 
ments of Z, are 1 ’s. Then the solution of the upper-trianguiar system LJX = Y requires 
I + 2 H-F N = (N 2 + N)f 2 multiplications and divisions and {N 2 — N)/2 sub¬ 

tractions. Therefore, finding the solution to LUX = B requires 

JV 2 multiplications and divisions, and N 2 — N subtractions. 




AY 


We see that the bulk of the calculations lies in the triangularization portion of the 
solution. If the linear system is to be solved many times, with the same coefficient 
matrix A but with different column matrices B, it is not necessary to triangularize the 
matrix each time if the factors are saved. This is the reason the triangular factorization 
method is usually chosen over the elimination method. However, if only one linear 
system is solved, the two methods are the same, except that the triangular factorization 
method stores the multipliers. 


Permutation Matrices 

The A = LU factorization in Theorem 3.10 assumes that there are no row inter¬ 
changes. It is possible that a nonsingular matrix A cannot be directly factored as 
A = LU 

Example 3.22. Show that the following matrix cannot be directly factored as A = L U : 

V 2 6' 

A= 4 8-1. 

—1 3 5_ 

Suppose that A has a direct factorization LU: then 


1 2 6 

4 8-1 
'2 3 5 


1 0 0 u]i m 2 u(3 

m2Y 10 0 U22 u 23 

r«3l W32 1 o 0 « 33 


The matrices L and U on the right-hand side of (13) can be multiplied and each element 
of the product compared with the corresponding element of the matrix A. in the first 
column, i = Ian. then 4 = /n 2 iwu = m 2 \, and finally -2 = /W 3 inn = mj\. In 
the second column, 2 — lui 2l then 8 — = (4)(2) + U 22 implies that U 22 = 0 . 

and finally 3 = JM 31 U 12 + 0132^22 = (-2){2) 4- W 3 2 ( 0 ) = -4, which is a contradiction. 
Therefore, A does not have a LU factorization. m 

A permutation of the first N positive integers 1, 2, . ., A r is an arrangement k [, kj, 
..., kpj of these integers in a definite order. For example 1, 4,2, 3, 5 is a permutation of 


for i =s 1,2, 


r N, are used in the next definition. 


Definition 3.5. An N x N permutation matrix P is a matrix with precisely one entry 
whose value is 1 in each column and row, and all of whose other entries are 0. The 
rows of P are a permutation of the rows of the identity matrix and can be written as 


r = l E ’ t , 


riJ- 
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the elements of P = [Pif\ have the form 



Theorem 3.11. Suppose that P = [E^ E'^ ... Ejj' is a permutation matrix. 

The product PA is a new matrix whose rows consist of the rows of A rearranged in 
the order row*. A, row* 2 A,..., row A/v A. 

Example 3.23. Let A be a 4 x 4 matrix and let P be the peculation matrix given iw (15V, 
then PA is the matrix whose rows consist of the rows of A rearranged in the order row 2 A, 
rowj A, row 4 A, row 3 A. 

Computing the product, we have 

0 10 0 an a\2 a] 3 aul Ta 2 \ a 22 a 2 3 a 2 4~ 

1 0 0 0 a 2 \ a 2 2 023 a 2 4 _ «n a l2 «i3 au 

0 0 0 1 «31 «32 033 034 ~ 04\ a42 043 044 

.0 0 1 0_ JI41 042 a43 «44_ 133] 1232 £33 034 


Theorem 3.12. If P is a permutation matrix, then it is nonsingular and P~ l — P\ 

Theorem 3.13. If A is a nonsingular matrix, then there exists a permutation matrix 
P so that. P A has a triangular factorization 

(16) PA=LU. 

The proofs can be found in advanced linear algebra texts. 

Example 3,24. If rows 2 and 3 of the matrix in Example 3.22 are interchanged, then the 
resulting matrix PA has a triangular factorization. 

The permutation matrix that switches rows 2 and 3 is P — [E\ E' s E ! 2 ^ . Comput¬ 
ing the product PA, we obtain 

"1 0 0] r 1 2 6~| r ] 2 6" 

PA= 0 0 1 4 8 —1 = —2 3 5 

0 1 0J [-2 3 5j [48-1 
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Now Gaussian elimination without row interchanges can be used: 

pivot -► _l 2 6 

n i2i — —2 —2 3 5 

«i3i = 4 4 8 —1 

After xj has been eliminated from column 2, row 3, we have 

'1 2 6 " 

pivots 0 _7 17 =£/. 

m 32 * 0 [O 0 -25 


Extending the Gaussian Elimination Process 

The following theorem is an extension of Theorem 3.10, which includes the cases 
when row interchanges are required. Thus triangular factorization can be used to find 
the solution to any linear system AX = B , where A is nonsingular. 

Theorem 3.14 (Indirect Factorization: PA = LU). Let A be a given N x A 
matrix. Assume that Gaussian elimination can be performed successfully to solve the 
general linear system AX = B, but that row interchanges are required. Then there 
exists a permutation matrix P so that the product PA can be factored as the product 
of a lower-triangular matrix L and an upper-triangular matrix V : 

PA = LU. 

Furthermore, L can be constructed to have l's on its main diagonal and V u ill have 
nonzero diagonal elements. The solution X is found in four steps: 

1. Construct the matrices L, U 7 and P. 

2. Compute the column vector PB. 

3. Solve LY — PB for Y using forward substitution. 

4. Solve UX = Y for X using back substitution. 

Remark. Suppose that AX = B is to be solved for a fixed matrix A and several difft r 
ent column matrices B. Then step 1 is performed only once and steps 2 through 4 ab¬ 
used to find the solution X that corresponds to B . Steps 2 through 4 are a computatio i 
ally efficient way to construct the solution X and require 0(N 2 ) operations instead 4 
the 0(N 3 ) operations required by Gaussian elimination. 
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MATLAB 

The MATLAB command [L,U,P]=lu(A) creates the lower-triangular matrix L, the 
upper-triangular matrix U (from the triangular factorization of A), and the permutation 
matrix P from Theorem 3.14, 


Example 3.25. 

ample 3.22. Vei 


Use the MATLAB command [L,U,P]=lu(A) on the matrix A in Ex- 
y that A — P~ i AV (equivalent to showing that PA = LU). 


»A= [l 2 6 ;4 8 -l;-2 3 -5] ; 
»[L,U,P] =lu(A) 

L- 

1.0000 0 0 
-0.5000 1.0000 0 
0,2500 0 1.0000 

U= 

4.0000 8.0000 -1.0000 
0 7.0000 4.5000 

0 0 6 - 2500 

P- 

0 10 
0 0 1 
1 0 0 


»inv(P)*L*U 
1 2 6 
4 8-1 
-2 3 5 


As previously indicated the triangular factorization method is often chosen over the 
elimination method. In addition, it is used in the inv(A) and det(A) commands in 
MATLAB. For example, from the study of linear algebra we know that the determinant 
of a nonsingular matrix A equals (—\) q det £/, where U is the upper-triangular matrix 
from the triangular factorization of A and q is the number of row interchanges required 
to obtain P from the identity matrix I. Since V is an upper-triangular matrix, we 
know that the determinant of U is, just the product of the elements on its main diagonal 
(Theorem 3.6). The reader should verify in Example 3.25 that; det(A) = 175 = 
(-l) 2 (175) = (-l) 2 det(£/). 

The following program implements the process described in the proof of Theo¬ 
rem 3,10. It is an extension of Program 3.2 and uses partial pivoting. The interchang¬ 
ing of rows due to partial pivoring is recorded in the matrix if. The matrix R is then 
used in the forward substitution step to find the matrix Y. 
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Program 33 {FA = LUi Factorization with Pivoting). To construct the solu¬ 
tion to the linear system AX = B t where A is a nonsingiilar matrix. 

function X = lufact(A.B) 

'/.Input ' A is an N i N matrix 

% - B is an N x 1 matrix 

XOutput - X is an N x 1 matrix containing the solution to AX = B. 

^Initialize X, Y, the temporary storage matrix C, and the row 

*!, permutation information matrix A 
[N,N]=size(A); 

X=zeros(N,1); 

Y=zeros(N,l); 

C=zeros(l,N); 

R=1:N; 
for p=i:N-l 

'/.Find the pivot row for column p 
[maxi, j ]=max Cabs (A(p:M,p))); 

'/,Interchange row p and j 
C=A(p J :); 

A(p,:)=A(j+P”l,:); 

ACj+p’l t :)=C; 
d=R(p); 

RCp)=R(j+p-i); 

R(j+p-l)»d; 

if A(p,p)=-0 

’A is singular. No unique solution' 
break 

end 

'/.Calculate multiplier and place in subdiagonal portion of A 
for k=p+l:N 

mult=A(k,p)/A(p,p); 

A(k.p) = mult; 

ACk,p+l:N)-A(k,p+l:N)-rault*ACp,p+l:N); 

end 

end 

'/.Solve for Y 
Y(l) - B(RCl))j 
for k=2:N 

Y(k)= B(R(k))-A(k,l:k-l)*Y(l:k-l); 

end 

'/.Solve for X 
X(N)=YOO/ACN,N); 


for k«=N-l;-l:l 

X Ck) = CY(k) -A(k,k+1 : N) *X (k+1; N) ) /A (k , k) ; 

end 


Exercises for Triangular Factorization 


1. Solve LY = B,UX = Y, and verify that B = AX for (a) B = [-4 10 5]' and 

(b) B = [20 49 32]', where A = LU is 


"2 

4 

-6" 


" 1 

0 

0" 

"2 

4 

-6" 

1 

5 

3 

= 

1/2 

1 

0 

0 

3 

6 

1 

3 

2 


_l/2 

1/3 

1 

0 

0 

3 


2. Solve LY = B,UX = Y , and verify that B = AX for (a) B = [7 2 10]' and 

(b)B = [23 35 7]', where A = LU is 


1 

1 

6" 


1 

0 

0“ 

"1 

1 

6" 

-1 

2 

9 

= 

-1 

1 

0 

0 

3 

15 

1 

-2 

3 


1 

-1 

1 

0 

0 

12 


3. Find the triangular factorization A ~LU for the matrices 



"-5 2 -1 


"10 3" 

(a) 

1 0 3 

(b) 

3 1 6 


3 1 6 


-5 2 -1 

4. Find the triangular factorization A — LU for the matrices 


~4 2 r 


1 -2 7 

(a) 

2 5 -2 ; 

Co) 

4 2 1 


1-2 7: 


2 5-2 


5. Solve LY ^B,UX = Y, and verify that B = AX for (a) B = [8 -4 10 -4] 

and (b) B = [28 13 23 4]', where A = LU is 


“4 

8 

4 

0" 


n 

0 

0 

0“ 


"4 

8 

4 

O' 

1 

5 

4 

-3 


_1 

1 

0 

0 


0 

3 

3 

—3 

1 

4 

7 

2 

— 

2 

T 

j 

1 

0 


0 

0 

4 

4 

1 

3 

0 

-2 


I 

“2 

1_ 


0 

0 

0 

1 


6. Find the triangular factorization A = LU for the matrix 

" 1 10 4" 

2-150 
5 2 1 2 1 

-3 0 2 6 


7. Establish the formula in (12). 
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8 , Show that a triangular factorization is unique in the following sense: If A is nonsin- 
gularandLi£/| = A = L 2 U 2 , then L\ = Li and V\ = U 2 , 

9- Prove the case r > c at the end of Theorem 3.10. 

10. (a) Verify Theorem 3.12 by showing that PP f = / = P'P for the permutation 
matrix 


0 10 0 
n ( 1 0 0 0 

r ” )o 0 0 1 • 

0 0 1 0 _ 

(b) Prove Theorem 3.12. Hint. Use the definition of matrix multiplication and the 
fact that each row and column of P and P ! contains exactly one 1. 

11. Prove that the inverse of a nonsingular N x N upper-triangular matrix is an upper 
triangular matrix. 


Algorithms and Programs 


1. Use Program 3.3 to solve the system AX = B, where 


' 1 

3 

5 

7" 


V 

2 

-J 

3 

5 

and B ~ 

2 

0 

0 

2 

5 

3 

-2 

-6 

-3 

l 


4 


Use the [L, U, P j ~lu(A) command in MATLAB to check your answer 

2. Use Program 3.3 to solve the linear system AX — B, where A = and 

ajj = i j ~\ and B — where b n = At and bn - i N ~ z /(i - I) for r > 2. 

Use N = 3,7, and 11. The exact solution is X = [l 1 ... 1 l] y . Explain any 
deviations from the exact solution. 

3. Modify Program 3.3 so that it will compute A -1 by repeatedly solving N linear sys¬ 
tems 


ACj = Ej for J = 1, 2. N. 


Then 


A[C| Ci ... Ov] = [£;i E 2 ... En] 
and 

A^ l =[Ci C 2 ... C N ]. 

Make sure that you compute the LU factorization only once! 



Figure 3.5 The electrical network 
for Exercise 4. 


4. Kirchoff’s voltage law says that the sum of the voltage drops around any closed path 
in the network in a given direction is zero. When this principle is applied to the circuit 
shown in Figure 3.5, we obtain the following linear system of equations: 

(Ri + R3 + RM + Rih + * 4/3 = £1 

(17) K 3 I 1 + (#2 + «3 + Rs)h - Rsh = E2 

R<\Ii — R5I2 + (^4 + R5 + 

Use Program 3.3 to solve for the current / ], 1 2 , and /3 if 
fa) Ri = l,R 2 = 1, R 3 = 2, R 4 = 1, R$ = 2, /?6 — 4, and E 1 — 23, E 2 = 29 
lb) J?i = 1, Ri - 0.75, U 3 = 1, R 4 = % R 5 = U Rf> ~ 4, and E v = 12 , 
Ez = 21.5 

(c) Ri = l,R 2 = 2 ,R 3 =A,Ri~ 3, R$ = I, *6 = 5, and E\ ~ 41, E 2 = 3S 

5. In calculus die following integral would be found by the technique of partial fractions; 

f _ x 2 + x + 1 _^ 

J (x-l)(x -2)(x-3) 2 (x 2 + l) 

This would require finding the coefficients A,, for i = 1,2,..., 6 , in the expression 

_ x 2 +x + l _ 

(x — l)(x — 2)(x — 3) 2 (x 2 4- 1) 

A 1 A 2 A 3 A 4 A 5 X + Ag 

= (T ; l) + Gr : 2)' , "^-3)2 + (j : -3) + (,r 2 + l) ' 
Use Program 3.3 to find the partial fraction coefficients. 

6 . Use Program 3.3 to solve the linear system AX = B, where A is generated us- 

ing the MATLAB command A-rand(10,10) andB»[l 2 3 ... 10]’. Remem¬ 

ber to verify that A is nonsingular (det(A) =£ 0) before using Program 3.3. Check 
the accuracy of your answer by forming the matrix difference AX — B and ex¬ 
amining how close the elements are to zero (an accurate answer would produce 
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AX — B = 0). Repeat this process using a coefficient matrix A generated by the 
command A=rand(20,20) and B=[l 2 3 20] ’. Explain any apparent dif¬ 

ferences in the accuracy of Program 3,3 on these two systems, 

7. In (8) of Section 3,1 we defined the concept of linear combination in N-dimensioru-l 
space. For example, the vector (4, —3), which is equivalent to the matrix [4 —3] . 

could be written as a linear combination of [l 0 ] and [0 l J : 




Use Program 3,3 to show that the matrix [1 3 5 7 9]' can be written as a lineai 

combination of 



Explain why any matrix [xi X 2 x$ X 4 can be written as a linear combina¬ 
tion of these matrices. 


3.6 Iterative Methods for Linear Systems 

The goal of this chapter is to extend some of the iterative methods introduced in Chap¬ 
ter 2 to higher dimensions. We consider an extension of fixed-point iteration that ap¬ 
plies to systems of linear equations. 


Jacobi Iteration 

Example 3.26. Consider the system of equations 

4x - y + z = 7 

(1) 4*-8 y+ z — -21 

—2x+ y + 5s = 15. 

These equations can be written in the form 



7 + y-z 

X = 

4 


21+ 4j t + z 

y = 

8 


l5 + 2r-y 

Z = 

5 


m 
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Linear systems with as many as 100,000 variables often arise in the solution oi 
partial differential equations. The coefficient matrices for these systems are sparse, 
that is, a large percentage of the entries of the coefficient matrix are zero. If the' -' 
is a pattern to the nonzero entries (i.e„ tridiagonal systems), then an iterative proce-.- 
provides an efficient method for solving these large systems. 

Sometimes the Jacobi method does not work. Let us experiment and see that i 
rearrangement of the original linear system can result in a system of iteration equation'- 
that will produce a divergent sequence of points. 

F.reimpig 3L27. Let the linear system (1) be rearranged as follows: 

-lx + y +5z= 15 

(4) 4x — 8y + z = —21 

4x— y + z= 7. 

These equations can be written in the form 

-15+y+5z 

- -5- 

( 5 ) 21+4x + z 

y 8 

z = 7 — 4x + y. 

This suggests the following Jacobi iterative process: 

-15+ y* +5zjt 

**+i =- 3 - 

(6) 21+4**+z* 

--g- 

Zk+l = 7-4x*+y*. 


See that if we start with Pq = (xq, yo,zo) — (1,2,2) then the iteration using (6) wil 1 
diverge away from the solution (2,4, 3). 

Substitute xo = 1, yo = 2, and zo = 2 into the right-hand side of each equation in (6 
to obtain the new values xi, yi, and z i: 


yi 


-15 + 2+10 _ 

2 


21+4 + 2 
8 


- 3.375 


Zi = 7-4 + 2 = 5.00. 


The new point Pi =(—1.5, 3.375, 5.00) is farther away from the solution (2, 4, 3) than P tt 
Iteration using the equations in (6) produces a divergent sequence (see Table 3.3). i 


s hc 3 .6 Iterative Methods for Linear Systems 159 


Thble 3.3 Divergent Jacobi Iteration for the Linear 
System (4) 



Gauss-Seidel Iteration 

Sometimes the convergence can be speeded up. Observe that the Jacobi iterative pro¬ 
cess (3) yields three sequences {jc*}, {y*}, and (z*} that converge to 2,4, and 3, respec¬ 
tively (see Table 3.2). It seems reasonable that x*+i could be used in place of x* in 
the computation of yjtj_i. Similarly, and vt+j might be used in the computation 
of Zk r I ■ 'The next example shows what happens when this is applied to the equations 
m Example 3,26. 


Example 3.28. Consider the system of equations given in (1) and the Gauss-Seidel itera¬ 
tive process suggested by (2): 

7 + y* - Zk 


i-HXt+i -rz* 


!5 + 2x t+ i — yjt+t 


See that ii we start with Pq = (xo, yo, Zo) = ( L 2. 2), then iteration using (7) will converge 
to the solution (2,4, 3). 

Substitute yo = 2 and zq = 2 into the first equation of (7) and obtain 

, 1 = i±L^ = 1 . 75 . 

4 

Then substitute xi = 1.75 and zo = 2 into the second equation and get 

_ 21 +4(1.75) -r 2 

8 ' 3 ‘ 75 ‘ 

finally, substitute xi = 1.75 and yi = 3.75 into the third equation to get 

Ji = .'L^?21z.1Z*_ 2 . 9J . 
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TMble 3.4 Convergent Gauss-Seidel Iteration for the 
System (1) 


k ! 

x* 

yk 

Zk 

0 

1.0 

2.0 

2.0 

i 

1.75 

3.75 

2.95 

2 

L95 

3.96875 

2.98625 

3 

1.995625 

3.99609375 

2.99903125 

$ 

1.99999983 

3.99999988 

2.99999996 

9 

1.99999998 

3.99999999 

3.00000000 

10 

2.00000000 

4.00000000 

3.00000000 


The new point P i = (1.75, 3.75, 2.95) is closer to (2,4, 3) than i*o and is better than the 
value given in Example 3.26, Iteration using (7) generates a sequence {jP*} that converges 
to (2, 4, 3) (see Table 3.4). ■ 

In view of Examples 3.26 and 3.27, it is necessary to have some criterion to de¬ 
termine whether the Jacobi iteration will converge. Hence we make the following 
definition. 

Definition 3.6. A matrix A of dimension N x N is said to be strictly diagonally 
dominant provided that 

N 

(8) tail >2^1^! for k = 1, 2, .... N. A 

j=l 

m 

This means that in each row of the matrix the magnitude of the element on the 
main diagonal must exceed the sum of the magnitudes of all other elements in the row 
The coefficient matrix of the linear system (1) in Example 3,26 is strictly diagonally 
dominant because 


Inrow I: |4[> j-l| + [l| 

In row 2: | - 8| > |4| +jl| 

In row 3: |5| > | — 2| + |11. 

All the rows satisfy relation (8) in Definition 3.6; therefore, the coefficient matrix A 
for the linear system (1) is strictly diagonally dominant. 

The coefficient matrix A of the linear system (4) in Example 3.27 is not stricth 


diagonally dominant because 

Inrow 1: | - 2) < |t| + |5[ 

In row 2: | - 8| > |4| + |1| 

In row 3: |I| < |4j + | — 1). 

Rows 1 and 3 do not satisfy relation (8) in Definition 3.6; therefore, the coefficient 
matrix A for die linear system (4) is not strictly diagonally dominant. 

We now generalize the Jacobi and Gau$s-Seidel iteration processes. Suppose that 
the given linear system is 


01lXl +012X2 

+ ■ 

■ ■ + a\jxj + ■ 

■ * + 

01N*V = hi 

021*1 +022X2 

+ • 

■ • +02;*/ + • 

■■ + 

02JVX/V = b2 

«/1*1 ~b0j 2*2 

+ * 

• ‘ + a n x j + * ■ 

■ ■ + 

0jJV* N =bj 

0Afl*l +0JV2*2 + ■ ■ 

- +HNjXj + • ■ 

•■ + 

(*nnXn = hjv. 


Let the fcth point be Pk = (xj*\ x^,..., xJ A \ ..., xj^); then the next point is 

1 = . *j* +n .The superscript (k) on the coor¬ 

dinates of Pi t enables us to identify the coordinates that belong to this point. The 
iteration formulas use row j of (9) to solve for in terms of a linear combination 
of the previous values x^*, ..., xj 0 ,..., 

Jacobi iteration: 


( 10 ) ^(*+ 1 ) — a jj+ lX j-ll a jN x ft^ 

Xj ~ <*JJ 

foij = 1, 2,..., N. 

Jacobi iteration uses all old coordinates to generate all new coordinates, whereas 
Gauss-Seidei iteration uses the new coordinates as they become available: 

Gauss-Seidel Iteration: 


(11) x)^ u = 


, (Jt+i) 

(t+i) _ l ~ 


-ajj+xxfli 






for 7 = 1,2. N. 





The following theorem gives ft sufficient condition for Jacobi iteration to converge. 

Theorem 3.15 (Jacobi Iteration). Suppose that A is a strictly diagonally dominant 
matrix. Then AX = B has a unique solution X — P. Iteration using formula (10) 
will produce a sequence of vectors { P that will converge to P for any choice of the 
starting vector /V 

Proof The proof can be found in advanced texts on numerical analysis. • 

U can be proved that the Gauss-Seidel method will also converge when the ma¬ 
trix A is strictly diagonally dominant. In many cases the Gauss-Seidei method will 
converge faster than the Jacobi method; hence it is usually preferred (compare Exam¬ 
ples 3.26 and 3.28). It is important to understand the slight modification of formula 
(10) that has been made to obtain formula (11). In some cases the Jacobi method will 
converge even though the Gauss-Seidei method will not. 

Convergence 

A measure of the closeness between vectors is needed so that we can determine if 
{Pjf j is converging to P The Euclidean distance (see Section 3.1) between P ~~ 
Ui, X2 . x N ) and Q = (yi, 112, ■ ■ , >n) is 

/at \ 

02) IIP - fill = (£>; - yfj ■ 

Its disadvantage is that it requires considerable computing effort. Hence we introduce 
a different norm, || ATU i: 


The following result ensures that Hit (U has the mathematical structure of a meti ic 
and hence is suitable to use as a generalized “distance formula.” From the study of 
linear algebra we know that on a finite-dimensional vector space all norms are equiv¬ 
alent; that is, if two vectors are close in the ||*|f | norm, then they are also close in the 
Euclidean norm H*H. 

Theorem 3.16. Let X and Y be jV-dimensional vectors and c be a scalar. Then the 
function llJfllj has the following properties: 


m t >o. 

113(11!= 0 if and only if 

\\cx\u = \c\mi, 
wx+Yh^mu + m i. 


AT = 0, 
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Proof We prove (17) and leave the others as exercises. For each j, the triangle 
inequality for real numbers states that (xj + yj [ < \ XJ | + [y^. Summing these yields 
Inequality (W: 

N UN 

}]X + TBi = £ )*/■ + yj\ < £ I*;I + X>,'l - Mi + mi ■ 

/=! j=1 j=I 

The norm given by (13) can be used to define the distance between points. * 

Definition 3.7. Suppose that A' and Y are two points in A-dimensional space. We 
define the distance between X and Y in die |(*(| j norm as 

j= I 

Example 3.29. Determine the Euclidean distance and 11*11! distance between the points 
P-(2A, 3) and Q = (1.75, 3.75, 2.95). 

The Euclidean distance is 

IIJ* ~ Q\\ = ((2 - 1-75) 2 + (4 - 3.75) 2 + (3 - 2.95) 2 ) [/2 = 0.3570. 

"The distance is 

\\P - Qh - 12 - 1.751 + 14 - 3.75j + 13 - 2.95| = 0.55. 

The 11*]) j is easier to compute and use for determining convergence in ALdimensional 
Space. ■ 

The MATLAB command A(j, j+l:Nj) is used in Program 3.4. This 

effectively selects ail elements in the y'th row of A, except the element in the y'th 
column (i.e., A(j , j)). This notation is used to simplify the Jacobi iteration (10) step 
in Program 3.4. 

In both Programs 3.4 and 3.5 we have used the MATLAB command norm, which 
Vs the Euclidean norm. The H*|}! can also be used and the reader is encouraged to 
check the Help menu in MATLAB or one of the reference works for information on 
the norm command. 


Program 3.4 (Jacobi Iteration). To solve the linear system AX = B by starting 
with an initial guess X — Pq and generating a sequence {P*} that converges to the 
yttvtiion. A sufficient condition for the method to be applicable is that A is strictly 
diagonally dominant. 

function X-jacobi(A,B,P,delta, maxi) 

% Input - A is an N x N nonsingular matrix 
% - B is an 11 x 1 matrix 
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V, - P is an N x 1 matrix; the initial guess 

*/, - delta is the tolerance for P 

7, - maxi is the maximum number of iterations 

y, Output - X is an N x 1 matrix: the jacobi approximation to 

V, the solution of AX = B 

W = length(B); 

for k=l:maxl 
for j=l:N 

X(j)»(B(j)~A(j,[1:j"l f j+1:N])*P( [11j-l,j+l;N]))/A(j,j); 

end 

err=abs(norm(X J -P)); 
relerr=err/(norm(X)+eps); 

P=X J ; 

if(err<delta)I(relerr<delta) 
break 

end 

end 

X=X J ; 

Program 3.5 (Gauss-Seidel Iterations To solve the linear system AX = B 
by starting with the initial guess X = P G and generating a sequence {P*j that 
converges to the solution. A sufficient condition for the method to be applicable is 
that A is strictly diagonally dominant. 

function X=gseid(A,B,P,delta, maxi) 

54 Input ' A is an N x N nonsingular matrix 

54 ' B is an H x 1 matrix 

% - P is an N x 1 matrix; the initial guess 

V. - delta is the tolerance for P 

*/, - maxi is the maximum number of iterations 

V. Output - X is an N x 1 matrix: the gauss-seidel 
V, approximation to the solution of AX = B 

N = length(B); 

for k=l:maxl 
for j*l:N 
if j = l 

X(1) = (B(1)-A(1,2:N)*P(2:N))/A(1,1) ; 
elseif j=-N 

X(N) = CB(N)-A(N,1:W-1)*(XC1:N-1)) O/ACN.K); 
else 

J4X contains the kth approximations and P the (k-l)st 
X(j) = (BCj)-ACj J l:j-l)*X(l:j-l) 
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-A(j,j+l:N)*P(j+l:N))/A(j,j); 

end 

end 

err=abs(norm(X’-P)); 
relerr^err/(norm(X)+eps); 

P=X J ; 

if(err<delta)I(relerr<delta) 
break 
end 
end 
X=X J ; 



1/2 
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satisfies the four properties given in {14H.I /)• 

11. Let X = (Jt|, *2 . *n)- Prove that the Hioe norm 

ll^tloo = max 1**1 
1 

satisfies the four properties given in (14)—(17). 


Algorithms and Programs _ 

1. Use both Programs 3,4 and 3,5 to solve the linear systems in Exercises 1 through 8 
Use the format long command and delta = 10 -9 . 

2. In Theorem 3.14 the condition that A be strictly diagonally dominant is a sufficient bu t 
not necessary condition. Use both Programs 3.4 and 3.5 and several different initial 
guesses for Pq on the following linear system. Note. The Jacobi iteration appears i. 
converge, while the Gauss-Seidel iteration diverges. 

x + z = 2 
-x + y =0 

x + 2y - 3z = 0 

3. Consider the following tridiagonal linear system, and assume that the coefficient m;i- 
trix is strictly diagonally dominant. 

d\x\-\-c\x2 =b\ 

d]X\ + ^2X2 + C2X3 = &2 

£*2*2 + £*3*3 + ^3*4 = & 3 


UN- 2*N-2 + ds-\XN-\ + CN-\Xn = £#-1 
<lN-\XN-\ +<*AfX/v = by. 

(i) Write an iterative algorithm, following (9)-(l 1), that will solve this system. Yoi 
algorithm should efficiently use the “sparseness” of the coefficient matrix. 

(ii) Construct a M ATLAB program based on your algorithm in and solve the followin 
tridiagonal systems. 
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(a) 4m i + m 2 =3 (b) 4m, + m 2 =1 

mi +4m 2 + m 3 =3 m , + 4m 2 + m3 = 2 

m2 +4/H3 + m 4 =3 m% + 4m 3 + m 4 = 1 

m 3 + 4m 4 + m s =3 m 3 +4m 4 + m 5 =2 

«48 + 4m 49 + mso = 3 m 4 8 + 4m 4 ? 4- m 50 = 1 

m 4 9 + 4m 50 = 3 m 49 4- 4m 5 o = 2 

4. Use Gauss-Seidel iteration to solve the following band system. 

12*i - 2x2 + *3 =5 

- 2xi 4- 12x2 - 2 x 3 + x 4 =5 

xi - 2x 2 + 12x 3 - 2x 4 4- x 5 = 5 

*2 - 2x 3 +12X4 - 2x 5 + X6 =5 

X4X— 2x47+12x48— 2X49+ *50 = 5 

X47 - 2X48 + 12x 4 9 - 2X50 — 5 
*48 - 2x49 -I- 12X50 = 5 


S. In Programs 3.4 and 3.5 the relative error between consecutive iterates is used as a 
stopping criterion. The problems with using this criterion exclusively were discussed 
in Section 2.3. The linear system AX = B can be rewritten as AX - B = 0. If Xk 
is the fcth iterate from a Jacobi or Gauss-Seidel iteration procedure, then the norm of 
the residual AX k - B is, in general, a more appropriate stopping criterion. 

Modify Programs 3,4 and 3.5 to use the residual as a stopping criterion. Use the 
modified programs to solve the band system in Problem 4 . 


1*7 Iteration for Nonlinear Systems: 

Seidel and Newton’s Methods (Optional) 

fterame techniques will now be discussed that extend the methods of Chapter 2 and 
Section 3.6 to the case of systems of nonlinear functions. Consider the functions 

^ j fi (*,>-) =x 2 2 x-y + 0,5 

hix.y) = x 2 + 4y 2 -4. 

We seek a method of solution for the system of nonlinear equations 
/i(x, y) = 0 and / 2 (*,y)= 0 . 


Tie equations fi (x, y) = 0 and f 2 (x, y) = 0 implicitly define curves in the xy- 
plane. Hence a solution of the system ( 2 ) is a point (p, q) where the two curves cross 
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Table 3.5 Fixed-Point Iteration Using the Formulas in (5) 


Case (i): Start with (0,1) 


Pk 

0.00 

-0.25 

-0.21875 

-0.2221680 

-0.2223147 

-0.222194] 

-0.2222163 

-0.2222147 

-0.2222145 

-0.2222146 


1.00 

1.00 

0.9921875 

0.9939880 

0.993812! 

0.9938029 

0.9938095 

0.9938083 

0.9938084 

0.9938084 


Case (ii): Start with (2.0) 


Pk 

9* 

2.00 

0.00 

2.25 

0.00 

2.78125 

-0.1328125 

4.184082 

-0.6085510 

9.307547 

-2.4820360 


44.80623 

1,011.995 

512,263.2 


-15.891091 

“392.60426 

-205,477.82 


This sequence is diverging. 


Case (i): If we use the starting value (po, <?o) = (0,1), then 

0 2 —1 + 0.5 „„ J —0 2 — 4(1) 2 + 8(1) + 4 

pi =-= -0.25 and q\ =---= 1.0. 

2 8 

Iteration will generate the sequence in case (i) of Table 3 .5. In this case the sequence 
converges to the solution that lies near the starting value (0, 1). 

Case (ii): If we use the starting value (po, qo) = (2, 0), then 

2 2 —0 + 0.5 „„ c J —2 2 — 4(0) 2 + 8(0} + 4 „„ 

Pi =-=-= 2.25 and a\ ----= 0,0. 

2 o 

Iteration will generate the sequence in case (ii) of Table 3.5. In this case the sequence 
diverges away from the solution. 

Iteration using formulas (5) cannot be used to find the second solution (1.900677, 
0.3112186). To find this point, a different pair of iteration formulas are needed. Start 
with equation (3) and add — lx to the first equation and — llv to the second equation 
and get 

x 1 — Ax — y — 0.5 = —lx and x 2 +4y 2 — lly — 4 = — lljy. 

These equations can then be used to obtain the iteration formulas 

-pf+4p*+?*-0.5 

Pk+\ = gl (p^ qk) = - z - 

( 6 ) , . ,. 


qk+i = giiPk, qk) = 


-pi”4qi + Ul K +4 


Table 3.6 shows how to use (6) to find the second solution. 





Theory 

We want to determine why equations (6) were suitable for finding the solution near 
(1,9, 0.3) and equations (5) were not. In Section 2.1 the size of thd derivative at the 
fixed point was the necessary idea. When functions of several variables are used, the 
partial derivatives must be used. The generalization of “the derivative” for systems 
of functions of several variables is the Jacobian matrix. We will consider only a few 
introductory ideas regarding this topic. More details can be found in any textbook on 
advanced calculus. 

Definition 3.8 (Jacobian Matrix). Assume that /i (x , yj and f 2 (x, y) are functions 
of the independent variables x and y; then their Jacobian matrix J(x , y) is 

ba ba 

9x 9y 

m Bji 

Bx 9y 

Similarly, if /j(x, y, z), / 2 (x, y, z), and fi(x, y, z) are Functions of the independent 
variables x , y, and z, then their 3x3 Jacobian matrix J (x, y, z) is defined as follows: 

~Bft_ BA 8A~ 

9x dy 3 z 

( 8 ) 9/2 BA B_h 

dx dy dz 

BA Bh BA 

3x dy dz 



A 


Example 3.30. Find the Jacobian matrix J(x , y, z) of order 3 x 3 at the point (1,3,2) 
for the three functions 

/l(x, y, z) = x 3 - y 2 + y - z* + z 1 
fi(x,y,z) =xy + yz+xz 

The Jacobian matrix is 


Thus the Jacobian evaluated at the point (1, 3,2) is the 3 x 3 matrix 

' 3 -5 -28“ 

7(1,3,2)= 5 3 4 . 

3 i _ 3 
L"5 2 iJ 


Generalized Differential 

For a function of several variables, the differential is used to snow how changes of the 
independent variables affect the change in the dependent variables. Suppose that we 
have 

(9) u = f i(x,y,z), i; =/ 2 (x, y, z), and w=/ 3 (x 1 y,z). 

Suppose that the values of the functions in (9) are known at the point (x 0 , yo, zo) 
and we wish to predict their value at a nearby point ( x,y,z ). Let da, dv, and dw 
denote differential changes in the dependent variables and dx, dy, and dz denote dif¬ 
ferential changes in the independent variables. These changes obey the relationships 


da = ^(*o, yo, zo) dx + ^(xo, yoi zo) dy + yo, zo)dz, 

dx By dz 

(10) d-V = ^Uo, yo, zo) dx + (xo, yo, zo)dy+j^-(x 0 , yo, zo)dz, 

dw — ”-(xo, y 0 ) zo) dx + -^<x 0 , yo, zo) dy + ( xq, yo, zo) dz. 
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If vector notation is used, (10) can be compactly written by using the Jacobian 
matrix. The function changes are dF and the changes in the variables are denoted dX. 

du dx 

(11) dF = dv = J (*o, yo, Zo) dy — J (xq, yo, zo) dX. 

dw] dz 

Example 3.31, Use the Jacobian matrix to find the differential changes {du, dv, dw) 
when the independent variables change from (1, 3, 2) to (1.02,2.97,2,01) for the system 
of functions 

» = f\ Cri y, z) = x 3 ~ y 2 + y - z 4 + z 2 
” = fi(x, y, z) = xy + yz+xz 
w = h{x,y,z) = —. 

Use equation (11) with /{1,3,2) of Example 3.30 and the differential changes 
(dx, dy, dz) ~ (0.02, —0.03,0.01) to obtain 

"du~* r 3 -5 —28 n r 0.02 -1 ^ -0.07' 

dv = 5 3 4 -0.03 = 0.05 . 

dw J j_-| i _|J [ O.Olj [-0.0525 

Notice that the function values at (1.02,2.97,2.01) are close to the linear approximu 
tions obtained by adding the differentials du = -0.07, dv = 0.05, and dw ~ -0.0525 to 
the corresponding function values /i(l, 3, 2) = -17, / 2 { 1, 3, 2) = 11, and / 3 (1, 3, 2) =- 
1.5; that is, 

/i0-02,2.97,2.01) = -17.072*= —17.01 = ft (1,3, 2) + 

/ 2 (] .02, 2.97, 2,01) = 11.0493 as 11.05 = f 2 { 1, 3,2) + dv 
/3(l-02, 2.97, 2.01) — 1.44864 « 1.4475 = / 3 (l, 3,2)+ dw. 


Convergence Near Fixed Points 

The extensions of the definitions and theorems in Section 2.1 to the case of two and 
three dimensions are now given. The notation for A r -dimensional functions has noi 
been used. The reader can easily find these extensions in many books on numerical 
analysis. 

Definition 3.9, A fixed point for the system of two equations 

( 12 > * = and y~gi{x,y) 

is a point (p, q) such that p = gi ( p, q) and q = gz(p, q). Similarly, in three dimen¬ 
sions a fixed point for the system 

(13) z), y~g 2 (x,y,z) and z = g 3 (x, y, z) 

is a point (p, q, r) such that p = gl ( p, q, r), q = g 2 (p, q , r) and r = g5 {p, q,r). a 
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Definition 3.10. Lor the functions (12), fixed-point iteration is 

( 14 ) Pk+i = g[(Pk,qk) and qk+\ = gi{Pk< qu) 

for k — 0, 1,.... Similarly, for the functions (13), fixed-point iteration is 

Pk +1 = gliPk, qk, n) 

( 15 ) qk+\-gl{Pk,qk,r k ) 

rit+i = &{pk*qk,n) 

for A = 0, 1,..., A. 


Theorem 3.17 (Fixed-Point Iteration). Assume that the functions in (12) and (13) 
and their first partial derivatives are continuous on a region that contains the fixed point 
(p, q) or ( p, q , r), respectively. If the starting point is chosen sufficiently close to the 
fixed point, then one of the following cases applies. 

Case (i): Two dimensions. If ( po , qo) is sufficiently close to (p , q) and if 

$gl { ^ v , \ , 

< 1 . 

dg 2 3g2 | 

1 

then the iteration in (14) converges to the fixed point ( p , q ). 

Case fii'j; Three dimensions. If (po, qo, ro) is sufficiently close to (p, q, r) and it 


— (p, (?,/*) + —(p, q,r) + — (p,q,r) < 1. 

I 0j?3 d g i 

l-p-(P^rr) + -^{p,q,r) + -f-(p,q,r) < 1, 

ox dy dz 


then the iteration in (15) converges to the fixed point (p, q, r). 


If conditions (16) or (17) are not met, the iteration might diverge. This will usually 
be the case if the sum of the magnitudes of the partial derivatives is much larger than V. 
Theorem 3.17 can be used to show why the iteration (5) converged to the fixed point 
near (-0.2,1.0). The partial derivatives are 


9 3 1 

= tl0c ,y ) = -- 


h S2i *' y) = = + 
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Indeed, for all (x, y) satisfying —0.5 < x < 0.5 and 0.5 < y < 1.5, the partial 
derivatives satisfy 

1*1 + 1 - 0.51 < ], 

L^ + l-y+ll < 0.625 < 1. 

Therefore, the partial derivative conditions in (16) are met and Theorem 3.17 implies 
that fixed-point iteration will converge to ip, q) ^ (—0.2222146, 0.9938084). Notice 
that near the other fixed point (1.90068,0.31122) the partial derivatives do not meet 
the conditions in (16); hence convergence is not guaranteed. That is, 

—gi (1.90068, 0.31122) 4* 1.90068, 0.31122) = 2.40068 > 1, 

dx 3y 

—£2(1-90068, 0.31122) + ^g 2 (l-90068, 0.31122) = 1.16395 >1. 

3* dy 


3 3 

-z-g\(x,y) + 7-si(*,jO 
3x dy 

3 9 

7- giix,y) 3- —g%{x t y) 
3x dy 


Seidel Iteration 

An improvement, analogous to the Gauss-Seidel method for linear systems, of fixed 
point iteration can be made. Suppose that p k +\ is used in the calculation of q k +) 
(in three dimensions both p k + j and qk+\ are used to compute r*+i). When these 
modifications are incorporated in formulas (14) and (15), the method is called Seide! 
iteration: 

(18) Pk+\ = gi(pk^qk) and tfit+i = gi(Pk+\, qk). 


and 

(19) 


Pk+i = g\(pk,qk,r k ) 
qk+\ =£2(pi+i,4**rit) 

rjt+i = g3(P*+t, Qk+ 1 , rt). 


Program 3.6 will implement Seidel iteration for nonlinear systems. Imple 
tion of fixed-point iteration is left for the reader. 


Newton’s Method for Nonlinear Systems 

We now outline the derivation of Newton’s method in two dimensions. 
method can easily be extended to higher dimensions. 

Consider the system 


( 20 ) 


w = /iU, y) 
v = fi(x, y). 
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which can be considered a transformation from the xy-plane to the wu-plane. We are 
interested in the behavior of this transformation near the point (xo, yo) whose image 
is the point (mo, uo)- If the two functions have continuous partial derivatives, then the 
differential can be used to write a system of linear approximations that is valid near the 
point (xo, yo)' 


( 21 ) 


u-u 0 = yo)(x - xo) + — /i(x 0 , yo)(y - yo), 

dx dy 

3 3 

v-vo= -^fzixo* yo)ix - xo) + — / 2 U 0 , yo)(y - yo)- 


The system (21) is a local linear transformation that relates small changes in the 
independent variables to small changes in the dependent variable. When the Jacobian 
matrix J(x 0 , yo) is used, this relationship is easier to visualize: 


9 9 

7-/i(*o,yo) t” fi yo) 

3x 9y 


p - xo* 

y-yo 


“ |^/2(*o,yo) —/2(xo,y 0 )J" 

If the system in (20) is written as a vector function V = F(X), the Jacobian 
J(x, y) is the two-dimensional analog of the derivative, because (22) can be written as 

(23) AF * J(x o, y 0 ) A*. 

We now use (23) to derive Newton’s method in two dimensions. 

Consider the system (20) with u and v set equal to zero: 


(24) " = 

0 = A(*.y). 

Suppose that ip, q) is a solution of (24); that is, 

(25) 0 = /.(/■•?) 

Q=Mp,qy 

To develop Newton’s method for solving (24), we need to consider small changes 
ith the functions near the point (po, 9o) ; 


Ait = u — mo, 
Av = v — uo, 


Ap = x — p 0 . 
Aq = y - <70- 


Set (x, y) = ip, q) in (20) and use (25) to see that (a, u) = (0,0). Hence the changes 
in the dependent variables are 


(27) 


u~uq = fi(p t q) - ft (po, q 0 ) = 0 - f\ (po, q 0 ) 
v-vq = flip, q) - fiipo, qo) = 0 - f 2 (po, qoi). 
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Use the result of (27) in (22) to get the linear transformation 


Example 3.32. Consider the nonlinear system 


(28) 


^/i(po,?o) 

a 

— f 2 {po,qo) 
dx 


7-/] (P0> ?o) 
dy 

d 

J~f2{p0,Q0) 

dy 


^ - r/ 1 ^ 90 ) 

L/2(Po. 0o) 


If the Jacobian J(po , <io) in (28) is nonsingular, we can solve for A P = [A/? Aq~^ 
\p q\ - [po qo]' as follows: 


(29) 


AP % -J(po,qor ] F(po,qo)- 


0 = x 2 ~ 2x - y + 0.5 
0 = x 2 +4y 3 -4. 

Use Newton’s method with the starting value (po, qo) = (2.00,0,25) and compute (pi,qi). 
{pl-H 2 ), and (P3.73). 

The function vector and Jacobian matrix are 


F(x,y) 


'x 2 - 2x - y + 0.5 
x 2 + 4y 2 — 4 


J(x, y) = 


2x — 2 
2jc 


-1 

8y 


] 


Then the next approximation P \ to the solution P is 

(30) Pi = Po + AP = P 0 - J{po,qoy l F(po t qo)- 

Notice that (30) is the generalization of Newton’s method for the one-variable case: 
that is, pi = po - f(po)/f{po). 


Ai the point (2.00,0.25) they take on the values 

F(2.oo,o.25)=[“;f 5 ], moo, 0.25) = [f" 

The differentials Ap and A q are solutions of the linear system 


Outline of Newton’s Method 

Suppose that Pk has been obtained. 
Step /. Evaluate the function 


F(P k ) = 


/i(p*, qk) 

f2{pk>qk) 




Step 2. Evaluate the Jacobian 


1*2.0 ’LOlfApl __f0.25l 
[4.0 2.0J [Aq\ [0.25J' 

A. straightforward calculation reveals that 



J(Pk) = 


j^f\(Pk, qk) — fi ( Pk ♦ qk) 

B d 

7^ f 2 (Pk,qk) — f 2 (Pk,qk) 


The next point in the iteration is 


Pj 


= P 0 + AP = 


2.00] f—0.093751 __ f 1.90625 
0.25J * [ 0.0625 J "" |_G.3125 


Step 3. Solve the linear system 


Similarly, the next two points are 


J(P k )AP = -F(P k ) for AP 

Step 4. Compute the next point. 


Pi 


*1.900691 

0.311213 




and 


P 3 = 


1.900677' 
0.311219 1 


Now, repeat the process. 


P*+i ^P* + AP. 


The coordinates of P 3 are accurate to six decimal places. Calculations for finding P 2 and 
P 3 are summarized in Table 3,7. ■ 
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Tbble 3.7 Function Values. Jacobian Matrices, and Differentials Required for Each 
Iteration in Newton's Solution to Example 3.32 



Solution of the linear system 



Pk 

J(P k )AP = -F(P k ) 


P k + AP 

[2.00] 

L0.25J 

[2.0 -1.0] 

[-0.09375] 

[-1 

[1.90625] 


|_4.0 2.0j 

|_ 0.0625J" 

[ 0.3125 


[1.90625] 

[1.8125 -1.0] I 

[-0.005559] 

[0.008789] 

[1,900691’ 


[ 0.3125j 

|_3,8i25 2.5j | 

[—0.001287j ’ 

[0.024414J 

]_0.311213 


[1.900691] 

[1.801381 -1,000000] I 

—0.000014] 

[0.000031] 

[1.900677 


|o,3 n 21 3 J 

[3.801381 2.489700J | 

0.000006j - 

[0.000038J 

[o.3J1219_ 



implementation of Newton’s method can require the determination of several par 
tial derivatives. It is permissible to use numerical approximations for the values ol 
these partial derivatives, but care must be taken to determine the proper step size. In 
higher dimensions it is necessary to use the methods for solving linear systems intro¬ 
duced earlier in this chapter to solve for AP. 

MATLAB 

Programs 3.6 (Nonlinear Seidel Iteration) and 3.7 (Newton-Raphson Method) will re¬ 
quire saving the nonlinear system X = G(X), and the nonlinear system F(X) = 0 
and its Jacobian matrix, JF t respectively, as M-files. As an example consider saving 
the nonlinear system in Example 3.32 and the related Jacobian matrix as the M-nies 
F. m and JF. m, respectively. 

function Z*F(X) function W=JF(X) 

x=X(l);y=X(2); x=X(l);y=X(2); 

Z=zeros(l,2); W=[2*x-2 -l;2*x 8*y]; 

Z(l)=x“2-2*x-y+0.5; 

Z(2)=x~2+4y~2-4; 

The functions may be evaluated using the standard MATLAB comm i ruK 
»A=feval(’F , , [2.00 0.25]) 

A= 

0.2500 0.2500 
»V=JF( [2.00 0.25]) 

B= 

2 -i 
4 2 
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Program 3.6 (Nonlinear Seidel Iteration). To solve the nonlinear fixed-point 
system X = G{X) t given one initial approximation Po, and generating a sequence 
that converges to the solution P. 

function [P,iter] = seidel(G,P,delta, maxi) 

7,Input - G is the nonlinear system saved in the M-file G.m 

7 - P is the initial guess at the solution 

7 - delta is the error bound 

% - maxi is the number of iterations 

7Output - P is the seidel approximation to the solution 
l - iter is the number of iterations required 

N=length(P); 
for k=l:maxl 
X=P; 

7, X is the kth approximation to the solution 
for j=l: N 

A=feval(’G\X); 

7, Update the terms of X as they axe calculated 
X (j )=A(j); 

end 

arr-abs(norm(X-P)); 
relerr=err/(norm(X)+eps); 

P*X; 

±ter=k; 

if(err<delta)I(relerr<delta) 
break 

end 

end 

In the following program the MATLAB command A\B is used to solve the linear 
system AX = B (see Q=P- (J\Y J ) J ). Programs developed earlier in this chapter could 
be used in place of this MATLAB command. The choice of an appropriate program 
to solve the linear system would depend on the size and characteristics of the Jacobian 
matrix. 


Program 3.7 (Newton-Raphson Method). To solve the nonlinear system 
F(X) — 0, given one initial approximation Pq and generating a sequence {/**} 
that converges to the solution P. 

function [P,iter,err]=newdim(F,JF,P,delta,epsilon,maxi) 

'/Input - F is the system saved as the M-file F.m 
7o - JF is the Jacobian of F saved as the M-file JF.M 

7, - P is the initial approximation to the solution 
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■/, - delta is the tolerance for P 

'/, - epsilon is the tolerance for F(P) 

% - maxi is the maximum number of iterations 

y,Qutput - P is the approximation to the solution 

- iter is the number of iterations required 
% - err is the error estimate for P 

Y=feval(F,P); 
for k=l:maxi 

J-fevalUFsP); 

q=p-(AY')’; 

Z*feval(F,Q); 

0rr=norm(Q-P); 
relerr=err/(norm(Q)+eps); 

p=Q; 

Y=Z; 

iter^k; 

if (err<aelta)i(relerr<delta)I(abs(Y)< 6 psilon) 
break 

end 

end 

Exercis es for Iteration for Nonlinear Systems 

1. Find (analytically) the fixed point(s) for each of the following systems. 

(a) r = gi(x,y) = x -y z 

y = S 2 (x, y ) = -x + 6y 

(b) x = gi(x,y) = (x 2 -y 2 -x-3)/3 

y = g2(x t y) = (-* + y - l )/3 

(c) x = gi(x,y) = sin(y) 

V - gl(x, y) = -6x + y 

(d) x = gi(x,y,z) =9-3y -2z 
y = g 2 (x,y,z) = 2 -x+z 
z=g 3 (x,y t z) = -9 + 3 x+4y~z 

2. Find (analytically) the zero(s) for each of the following systems. Evaluat 
bian of each system at each zero. 

(a) 0=/i(*,y)=2r + y-6 

0= fiix.y) =x + 2y 

(b) 0= f l (x,y) = 3x 2 + 2y-4 
0 - f 2 (x, y) = 2x + 2y - 3 
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circle for Exercise 5. 


(c) 0 = f\(x, y) = 2x — 4cos(y) 

0 = flix, y) = 4x $in(y) 

(d) 0 = /i (jt, y,z)=x 2 + y 2 -z 

0 = fi(x, y, z) = x 1 + y 1 + z 1 - 1 
0 = /3(Jf,y,z) = * f }' 

3. Find a region in the xy-plane such that if (po, q 0 ) is in the region then fixed-point 
iteration is guaranteed to converge (use an argument similar to the one that followed 
Theorem 3.17) for the system: 


x = gi(x,y) = (j c 2 -y 2 -x- 3)/3 
y = g 2 (*,y) = (x + y + I)/3. 


4. Rewrite the following linear system in fixed-point form. Find bounds on x , y, and z 
such that fixed-point iteration is sure to converge for any initial guess (pn. a 0 , r 0 ) that 
satisfies the boundary conditions. 


6x 4- y + z = 1 
x -F 4y + z=2 
x + y + 5z = 0 


3. For the given nonlinear system, use the initial approximation (po, go) — (1.1,2.0), 
and compute the next three approximations to the fixed point using (a) fixed-point 

iteration and pnnatinne f1d\ and fht 


x =£i(x,y) 
y = F 2 (x,y) 


8x — 4x 2 4- y 2 4- 1 

--- (hyperbola) 

o 

2x — x 2 + Ay — y 2 + 3 

--- (circle). 

4 
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6. For the following nonlinear system, use the initial approximation (po, qo) = (-0.3, 
— 1.3), and compute the next three approximations to the fixed point using (a) fixed- 
point iteration and equations (14) and (b) Seidel iteration using equations (18). 


x = gi(x, y) = 


v-x 3 +3 x 2 + 3x 


y = gi(x, y) = 


y 2 + 2y - x - 2 


(cubic) 


(parabola). 


7. Consider the nonlinear system 

0 = /l (x, y) = x 2 — y — 0.2 
0 - f 2 (x, y) = y 2 - x - 0.3. 

These parabolas intersect in two points as shown in Figure 3.9. 


and (p 2l 92 ). 

(b) Start with (po, qo) = (-0.2, -0.2) and apply Newton’s method to compute 
(f] , ?i) and (p 2 ,q 2 )- 

8, Consider the nonlinear system shown in Figure 3.10. 

0 ~ fi(x, y) = x 2 + y 2 — 2 
0 - fi(x, y) = xy - 1. 


(a) Verify that the solutions are (1,1) and (-1,-1). 

(b) What difficulties might arise if we try to use Newton’s method to find the solu 
tions? 

9. Show that Jacobi iteration for a 3 x 3 linear system is a special case of fixed-point 
iteration (15). Furthermore, verify that if the coefficient matrix from a 3 x 3 linear 
system is strictly diagonally dominant then condition (17) is satisfied. 
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Figure 3.9 The parabolas for Figure 3.10 The circle and hyper- 

Exercise 7. bola for Exercise 8. 


VewtOu’s method fur two equations can be written in fixed-point iteration 


x = gi(x,y), y-gi{x,y). 


where gi(x, y) and g 2 (x, y) are given by 


£i(x, y) = x ■ 




det (J(x, v)) 


fi(x, y)^/i (x, y) - /] (x, y)&Mx, y) 
detU(x, y» 


11. Fixed point iteration is used to solve the nonlinear system (12). Use the following 
steps to prove that conditions in (16) are sufficient to guarantee that {(/?*, qk)} con 
verges to (p, q). Assume that there is a constant K with 0 < K < 1 so that 

3 d 

+ <K 


a , d 

^£ 2 (x,y) + —g 2 (x,y) < K 

for all (x, y) in the rectangle R = {(jc, y) : a < x < b, c < y < d}. Also assume 
that a < po <b and c < qo < d. Define 


ek=p-pk, Ek=q-qk , and r* = max{|^f, |£*|}. 
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Use the following form of the Mean Value Theorem applied to functions of two vari¬ 
ables: 

■9 9 

e*+i = (P’ c D E k 

9 3 

Ejt+i “ fa8 2 ( b k'V k)ek + fy 82 

where a* k and b* lie in [a, b] and c* k and d* k lie in k, rf]. Prove the following: 

(a) kil < tfroandltfil < Kr 0 

(b) \e 2 \ < Kri < K 2 r c , and jfa! < *>! < -% 2r 0 

(c) left| < Kn- 1 < K k ro and |£ft| 5 Kr k - 1 < K k r G 

(d) lim n _oo Pk = P and lim^oc. q k = q 

12, As noted earlier, the Jacobian matrix of system (20) is the two-dimensional analog 
of the derivative. Write system (20) as a vector function V = FiX). and let J[F\ 
be the Jacobian matrix of this system. Given two nonlinear systems V = F(X i and 
V = G(X ) and the real number c, prove: 

(a) J{cF(X)) = cJ{F(X)) 

(b) /(£(*)+£(*)) = J(F(X)) + J(G(X)) 


Algorithms and Programs 


1. Use Program 3.6 to approximate the fixed points of the systems in Exercises 5 a 
Answers should be accurate to 10 decimal places. 

2. Use Program 3,7 to approximate the zeros of the systems in Exercises 7 and 8. 
swers should be accurate to 10 decimal places. 


3, Construct a program to find the fixed points of a system using fixed-point iter: 
Use the program to approximate the fixed points of the systems in Exercises 5 a 
Answers should be accurate to 8 decimal places. 

4. Use Program 3.7 to approximate the zeros of the following systems. Answers si 
be accurate to 10 decimal places. 

(a) 0-x 2 -x + y 2 + z 2 -5 
0 = x 2 + y 2 -y + z z -4 
0 = x 2 + y 2 + z 2 + z - 6 

(b) 0 = x 2 —* + 2y 2 + yz— 10 

0 = 5x - 6 y + z 

0 = z—x 2 — y 2 

(c) 0=(* + l) 2 + (y+l) 2 -s 

0 = (x - l) 2 + y 2 — z 

0 = 4x 2 -f 2y 2 + z 2 - 16 
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(d) 0 = 9x 2 4- 36y 2 + 4z 2 - 36 
0 = x 2 ~2y 2 - 20 z 
0=16x~x*-2y 2 -l6z 2 

5, We wish to solve the nonlinear system 

0 = 7jc 3 - 10x - y - 1 
0 = 8y 3 — 1 lv + jr — L 

Use MATLAB to sketch the graphs of both curves on the same coordinate system. 
Use the graph to verify that there are nine points where the graphs intersect. Using 
the graph, estimate the points of intersection. Use these estimates and Program 3.7 to 
approximate the points of intersection to 9 decimal places. 

6. The system in Problem 5 can be rewritten in fixed-point form: 

7jr 3 — y — 1 

10 

8y 3 + x — 1 
11 

Do some computer experimentation. Discover that, no matter what starting value is 
used, only one of the nine solutions can be found using fixed-point iteration (on this 
particular fixed-point form). Are there other fixed-point forms of the system in 5 that 
could be used to find other solutions of the system? 


x — 

y = 
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Interpolation and 
Polynomial Approximation 


The computational procedures used in computer software for the evaluation of a li¬ 
brary function, such as sin(*), cos(jt), or e x , involve polynomial appproximation. The 
state-of-the-art methods use rational functions (which are the quotients of polynomi¬ 
als). However, the theory of polynomial approximation is suitable for a first course 
in numerical analysis, and we will mainly consider them in this chapter. Suppose thai 
the function f(x) = e x is to be approximated by a polynomial of degree n — 2 over 
the interval [-1, 1]. The Taylor polynomial is shown in Figure 4.1(a) and can be con- 


y y 



Figure 4.1 (a) The Taylor polynomial p(x) — 1.000000 + 1.000000* + 
0.500000ur 2 which approximates f{x) = e x over [—1,1]. (b) The Chebyshev 
approximation q(x) = 1.000000 + 1.129772* + 0.532042* 2 for /(*) = e x over 
[- 1 . 1 ]- 


y 



Figure 4.2 The graph of the col¬ 
location polynomial that passes 
through (1,2), (2, 1), (3,5), (4,6), 
and (5, 1). 


trusted with the Chebyshev approximation in Figure 4.1(b). The maximum error for 
the Taylor approximation is 0.218282, whereas the maximum error for the Chebyshev 
polynomial is 0.056468. In this chapter we develop the basic theory needed to investi¬ 
gate these matters. 

An associated problem involves the construction of the collocation polynomial. 
Given n + 1 points in the plane (no two of which are aligned vertically), the colloca¬ 
tion polynomial is the unique polynomial of degree < n that passes through the points. 
In cases where data are known to a high degree of precision, the collocation polyno¬ 
mial is sometimes used to find a polynomial that passes through the given data points. 
A variety of methods can be used to construct the collocation polynomial- solving a 
linear system for its coefficients, the use of Lagrange coefficient polynomials, and the 
construction of a divided differences table and the coefficients of the Newton poly¬ 
nomial. All three techniques are important for a practitioner of numerical analysis to 
know. For example, the collocation polynomial of degree n = 4 that thrn,urh 
the five points (1, 2), (2, 1), (3, 5), (4’, 6), and (5, 1) hf 

5x 4 — 82x 3 + 427x 2 — 806* + 504 
24 

and a graph showing both the points and the polynomial is given in Figure 4.2, 




Taylor Series and Calculation of Functions 

Limit processes arc the basis of calculus. For example, the derivative 

f'M= ifr/fr 

A-MJ h 

k the limii of the difference quotient where both the numerator and the denominator 
go to zero. A Taylor series illustrates another type of limit process. In this case an 
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Table 4.1 Taylor Series Expansions for Some Common Functions 

, x^ x^ x 7 

S m (x ) =x-- + --- + ... 

for all x 

X 2 x 4 X 6 

«K W =I-- + --- + ... 

for all x 

X 1 I 3 X* 

* I = ,+ ' + ¥ + 3! + ir + - 

for all x 

X 2 X 3 X 4 

ta(l+*)=*- T + T - T +-•• 

-l < x < 1 

arctan(x) = x - ^H- 

-1 <x < 1 

(1 + 4^-1 + „+*'" 1> S+ P(P ~ '> <P - 2) , 3 + . ■ • 

for|x| < 1 


SbC . 4.1 TAYLOR SERIES AND CALCULATION OF FUNCTIONS 


Tfebk 4.2 Partial Sums S„ Used to 
Determine e 


rt 

= 1 + — + — H-j- -L 

1! 2! n! 

0 

1.0 

1 

2.0 

2 

2.5 

3 

2.666666666666 ... 

4 

2.708333333333 ... 

5 

2.716666666666... 

6 

2,718055555555... 

7 

2.718253968254... 

8 

2.718278769841 ... 

9 

2.718281525573 ... 

10 

2.718281801146... 

11 

2.718281826199... 

12 

2.718281828286 ... 

13 

2.71828187.8447 

14 

2.718281828458... 

15 

2.738281828459... 


infinite number of terms is added together by taking the limit of certain partial sums. 
An important application is their use to represent the elementary functions: sin(x), 
cos(y), e x , ln(.t), etc. Table 4.1 gives several of the common Taylor series expansions. 
The partial sums can be accumulated until an approximation to the function is obtained 
that has the accuracy specified. Series solutions are used in the areas of engineering 
and physics. 

We want to learn how a finite sum can be used to obtain a good approximation 
to an infinite sum. For illustration we shall use the exponential series in Table 4.1 to 
compute the number e = e 1 , which is the base of the natural logarithm and exponential 
functions. Here we choose x — 1 and use the series 


, , l l 2 l 3 r l* 

e 1 =1 + - + T7 + — + — + ■■■ + — + ■■■. 


II 2! 31 4! 


ft! 


The definition for the sum of an infinite series in Section 1.1 requires that the partial 
sums Shi tend to a limit. The values of these sums are given in Table 4.2. 

A natural way to think about the power series representation of a function is to 
view the expansion as the limiting case of polynomials of increasing degree. If enough 
terms are added, then an accurate approximation will be obtained. This needs to be 
made precise. What degree should be chosen for the polynomial, and how do we 
calculate the coefficients for the powers of x in the polynomial? Theorem 4.1 answers 
these questions. 


Theorem 4.1 (Taylor Polynomial Approximation), Assume that / € C N+l [a t b] 
and to e [a t b ] is a fixed value. If jc e [a, fr], then 

(V f(x) = P N (x) + E N (x) i 

where Pn{x) is a polynomial that can be used to approximate fix): 

The error term E^(x) has the form 


0) 


En(x) — 


(N+iy. 




for some value c - c(x) that lies between x and .v 0 . 
fyoof. The proof is left as an exercise. 


* 


Relation (2) indicates how the coefficients of the Taylor polynomial are calculated. 
Although the error term (3) involves a similar expression, notice that f^ N+l) (c) is to be 
evaluated at an undetermined number c that depends on the value of *. For this reason 
we do not try to evaluate E N (x): it is used to determine a bound for the accuracy of 
the approximation. 
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Example 4.1. Show why 15 terms are all that are needed to obtain the 1J-digit approxi 
mation e = 2.718281828459 in Table 4.2. 

Expand fix) — e x in a Taylor polynomial of degree 15 using the fixed value xo = ( 1 
and involving the powers (x — 0)* = x k . The derivatives required are fix') = f"(x) =- 
. -. = / (16} = e*. The first 15 derivatives are used to calculate the coefficients a* = e°/k 1 
and are used to write 

w A5tr> = 1+ jc + 37 + 3T + "' + B7; 

Setting x — 1 in (4) gives the partial sum Sis = Pls(I)- The remainder term is needed to 
show die accuracy of the approximation: 


(5) 


E 1500 = 


/< 16) (c)x 16 

16! 


Since we chose xo = 0 and x = 1, the value c lies between them (i.e., 0 < c < 1), which 
implies that e c < e l . Notice that the partial sums in Table 4.2 are bounded above by 3 
Combining these two inequalities yields e c < 3, which is used in the following calculation 


|£.,(1)| = ^-— < — < —- < 1.433844 x 10 -13 . 

1 IM 1 16! “ 16! 16! 

Therefore, all the digits in the approximation e 2.718281828459 are correct, because the 
actual error (whatever it is) must be less than 2 in the thirteenth decimal place. » 


Instead of giving a rigorous proof of Theorem 4.1, we shall discuss some of the 
features of the approximation; the reader can look in any standard reference text on 
calculus for more details. For illustration, we again use the function f(x ) = e x and 
the value xo = 0. From elementary calculus we know that the slope of the curve 
y = e x at the point (x, e x ) is /'(x) = e x . Hence the slope at the point (0, 1) i> 
f{ 0) = 1. Therefore, the tangent line to the curve at the point (0, 1) is y = 1 + x 
This is the same formula that would be obtained if we used N = 1 in Theorem 4.1 
that is, Fi(x) — /(0) + /'{0)x/T! = 1 + x. Therefore, Fi(x) is the equation of the 
tangent line to the curve. The graphs are shown in Figure 4.3. 

Observe that the approximation e x ^ 1 +x is good near the center xo = 0 and thai 
the distance between the curves grows as x moves away from 0. Notice that the slope - 
of the curves agree at (0, 1). In calculus we learned that the second derivative indicate- 
whether a curve is concave up or down. The study of curvature 1 shows that if two 
curves y = /(x) and y — g(x) have the property that fix o) - g(*o), /'(* o) = 
and f"{x o) ~ g"(xo) then they have the same curvature at xq. This property would be 
desirable for a polynomial function that approximates f(x). Corollary 4.1 shows thai 
the Taylor polynomial has this property for N > 2. 

1 The curvature K of agraph v — fix ) at (xo, yo) ' s defined by K = |/ w (xp)|/(l H-I/^xq)] 2 ) 3 ^ 
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Corollary 4,1. If P N (x) is the Taylor polynomial of degree N given in Theorem 4.1. 
then 

(6) P^Vxo) = f {k) (x o) for k = 0, 1, .... N. 

Proof. Set x = xo in equations (2) and (3), and the result is P A (xo) = /(x 0 ). Thus 
statement (6) is true fork = 0. Now differentiate the right-hand side of (2) and get 




‘-E 

4=0 


tf-l *(*+!) 


f k +l >(XQ) 

k\ 


(x -xo)*. 


Set x = x 0 in (7) to obtain P^(xo) — Thus statement (6) is true for k = 1. 

Successive differentiations of (7) will establish the other identities in (6). The details 
are left as an exercise. • 

Applying Corollary 4.1, we see that y — P 2 (x) has the properties /(xo) = Piixf). 
fix o) = PjCxo), and f' (x 0 ) = P/{x o); hence the graphs have the same curvature 
at xq. For example, consider fix) = e x and P 2 (x) = 1 + x + x 2 /2. The graphs are 
shown in Figure 4.4 and it is seen that they curve up in the same fashion at (0, 1). 

In the theory of approximation, one seeks to find an accurate polynomial approx¬ 
imation to the analytic Function 2 fix) over [a. b). This is one technique used in de¬ 
veloping computer software. The accuracy of a Taylor polynomial is increased when 
we choose N large. The accuracy of any given polynomial will generally decrease as 
the value of x moves away from the center xo- Hence we must choose IV large enough 
and restrict the maximum value of x - xol so that the error does not exceed a specified 
bound. If we choose the interval width to be 2 R andx 0 in the center (i.e., |x -xo! < P). 

2 The function fix) is analytic at xo if it has continuous derivatives of all orders and can be 
represented as a Taylor series in an interval about xq. 
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Table 4.3 Values for the Error Bound |error| < e R R N+l j(N + 1)! Using the 
Approximation e x P,v(x) for jjc j < R 



0 

11 

ee 

R = 1.5. 

R = L0, 

R = 0.5, 


\x\ < 2-0 

\x\ < 1.5 

1 *| < 1.0 

lJt| < 0.5 


0.65680499 

0.07090172 

0.00377539 

0.00003578 

e x «*P6(x) 

0.18765857 

0.01519323 

0-00053934 

0.00000256 

(P&Pjix) 

0.04691464 

0.00284873 

0.00006742 

0.00000016 


0.01042548 

0.00047479 

0.00000749 

0.00000001 


( 8 ) 


ferror! - I £*(*)! < 


MR N+l 

ov + nr 


where M < max{] I : jco - R < z < *o + R)- If N is fixed and the derivatives 

are uniformly bounded, the error bound in (8) is proportional to R N+] f(N + 1)! and 
decreases if R goes to zero as N gets large. Table 4.3 shows how the choices of these 
two parameters affect the accuracy of the approximation e x ~ Pm(x) over the interval 
jjr| < R. The error is smallest when N is largest and R smallest. Graphs for P 2 , Pi, 
and P 4 are given in Figure 4.5. 


Example 4.2. Establish the error bounds for the approximation e* « />&(*) on each of 
the intervals |x| < 1.0 and |x| < 0.5. 

If |x | < 1.0, then letting R = 1.0 and |/ <9} {c)| = |e £ j < e 10 = M in (8) implies that 

e t 0 (l 0) 9 

lerrori = |£ 8 (x)| < -^ =» 0.00000749. 
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If |Jcf 5 0.5, then letting R = 0.5 and f = < e 05 = M in (8) implies that 

e °- 5 (0 5} 9 

|error] = |E 8 (jt)| <-a* 0.00000001. ■ 

Example 4.3. If f(x) = e x , show that N = 9 is the smallest integer, so that the |error| = 
jlf/Y (a)[ < 0.0000005 for x in [—1, 1]. Kence Pg(x) can be used to compute approximate 
values of e x that will be accurate in the sixth decimal place. 

We need to find the smallest integer N so that 


|error| = |£jv(*)I < 


0,0000005. 


1 1 ' " “ (JV + 1)! 

In Example 4.2 we saw that N = 8 was too small, so we try N = 9 and discover 
that |£n(js)E < <? l (l) 9+i /(9 + 1)! < 0.000000749. This value is slightly larger than 
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desired; hence we would be likely to choose N = 10, But we used < e l as a crude 
estimate in finding the error bound. Hence 0.000000749 is a little larger than the actual 
error. Figure 4,6 shows a graph of EgOr) = e x — Pqix). Notice that the maximum vertical 
range is about 3 x 10 -7 and occurs at the right end point (1, £9(1)). Indeed, the maximum 
error on the interval is £ 9 (1) = 2.718281828 - 2.718281526 ^ 3.024 x I0“ 7 . Therefore, 
N = 9 is justified. ■ 


Methods for Evaluating a Polynomial 


for example the function 


fix) = (x - l) 8 . 


The evaluation of / will require the use of an exponential function. Or the binomial 
formula can be used to expand fix) in powers of x: 

( 10 ) = 

= X 8 - 8x 7 + 28x 6 - 56* 5 + 70x 4 - 56x 3 + 28x 2 - Sx + 1. 

Homer’s method {see Section 1.1), which is also called nested multiplication , can 
now be used to evaluate the polynomial in (10). When applied to formula (10), nested 
multiplication permits us to write 


(11) fix) = iiiiiiix - 8)x + 2 8)x - 56)x + 70)x - 56)x + 28}x - 8)x + L 

To evaluate fix) now requires seven multiplications and eight additions or sub¬ 
tractions, The necessity of using an exponential function to evaluate the polynomial 
has now been eliminated. 

We end this section with the theorem that relates the Taylor series in Table 4.1 and 
the Taylor polynomials of Theorem 4.1. 


Theorem 4,2 (Taylor Series). Assume that /(x) is analytic and has continuous 
derivatives of all order IV = 1,2,..., on an interval (a, b) containing xo- Suppose that 
the Taylor polynomials (2) tend to a limit 

(12) S{x)= lim Pn{x)= lim ^ ^ , ; V °\ x -xp)*, 

JV-*oc N-KX “ k! 

£=0 

then fix) has the Taylor series expansion 
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Proof Tnis follows directly from the definition of convergence of series in Sec¬ 
tion 1.1. The limit condition is often stated by saying that the error term must go 
to zero as N goes to infinity. Therefore, a necessary and sufficient condition for (13) 
to hold is that 


lim Etfix) — lim 

N— K50 N-^CO 


(N+ 1 )! 


where c depends on N and x. 


Exercises for Taylor Series and Calculation of Functions 

1. Let fix) = sin(x) and apply Theorem 4.1. 

(a) Use xq = 0 and find Psix), Pj(x), and P 9 (jc). 

(b) Show that if \x\ < 1 then the approximation 


-l! f! 
7! + 9! 


has the error bound |£<j(*)| < 1/10! < 2.75574 x 10" 7 . 

(c) Use xo = jt/ 4 and find Psix), which involves powers of (x - jt/ 4). 

2. Let fix) = cos(x) and apply Theorem 4.1. 

(a) Use xq = 0 and find P*(x), P^(x) t and Pg(x), 

(b) Show that if |jc | < 1 then the approximation 



has the error bound |Eg(x)| < 1/9! < 2.75574 x 10 -6 . 

(c) Use jcq = n/4 and find P^ix), which involves powers of (x — tt/4). 

3. Does f(x) = x 1 Z 2 have a Taylor series expansion about jco = 0? Justify your answer, 
Does the function f(x) — .t 1 ’ 2 have a Taylor series expansion about xo = 1 ? Justify 
your answer. 

4. (a) Find a Taylor polynomial of degree N — 5 for fix) = 1/(1 -f x) expanded 

about *0 = 0. 

(b) Find the error term Esix) for the polynomial in part (a), 

5. Find the Taylor polynomial of degree N = 3 for fix) = e~ x * f2 expanded about 

XQ = 0 , 

6. Find the Taylor polynomial of degree N = 3, P3OO, for fix ) = x 1 - 2x 2 + 2x 
expanded about xq = 1. Show that fix) = Pi(x). 
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7. (a) Find the Taylor polynomial of degree N - 5 for f{x) = x 1 ' 2 expanded about 

xo = 4. 

(b) Find the Taylor polynomial of degree N = 5 for f(x) = x 1/2 expanded about 
xo = 9, 

(c) Determine which of the polynomials in parts (a) and (b) best approximates 
(6.5) ,/2 . 

8. Use /U) = (2 + *) l/1 and apply Theorem 4.1. 

(a) Find the Taylor polynomial ^(x) expanded about xo = 2, 

(h) U se P 3 ( x ) to find an approximation to 3 1 . 

(c) Find the maximum value of |/ (4) (c) | on the interval 1 < c < 3 and find a bound 
for|£ 3 (x)l- 

9. Determine the degree of the Taylor polynomial Pn(x) expanded about xo = 0 that 
should be used to approximate 1 so that the error is less than 10 -6 . 

10. Determine the degree of the Taylor polynomial /%(x) expanded about xq — x that 
should be used to approximate cos(33jt/32) so that the error is less than 10~ 6 . 

11. {a) Find the Taylor polynomial of degree N = 4 for F(x) — f *, co s(t 2 )dt ex¬ 

panded about xo = 0 . 

(b) Use the Taylor polynomial to approximate F(0.1). 

(c) Find a bound on the error to the approximation in part (b). 

12. (a) Use the geometric series 

—~ = I - X 2 +x 4 -X 6 +x B - for \x\ < 1, 

1 +x 2 

and integrate both sides term by term to obtain 

arctan(x) = x - ^H- for jxj < 1. 

(b) Use n/6 = arctan(3 _l/2 ) and the series in part (a) to show that 


x = 3 t/2 x 2 



(c) Use the series in part (b) to compute x accurate to eight digits. 
Fact, x ~ 3.141592653589793284.... 

13. Use /(x) = ln( 1 + x) and x 0 = 0, and apply Theorem 4.1. 

(a) Show that /^(x) = (—1)* -] {(k — 1)!)/(1 + x) k . 

(b) Show that the Taylor polynomial of degree N is 


Pff (x) = x 


A .A * 4 , . (-l)*- 1 *" 

2 + 3 4 + '" + N 


(c) Show that the error term for (x) is 


Eff(x) =-L __ 

NK (JV+ 1 ) (l+c)^ 1 ' 

(d) Evaluate P 3 (0.5), Ffi(0.5), and ^(0-5). Compare with In(l .5). 

(e) Show that if 0.0 < x < 0.5 then the approximation 


ln(x) ks x - 




8 


+ 


x 


9 


9 


has the error bound IE 9 I < 0.00009765_ 

14. Binomial series. Let /(x) = (1 + x)? and xo = 0, 

(a) Show that /<*>(x) = pip - 1) - • (p - k + 1)(1 + *)*-*. 
(h) Show that the Taylor polynomial of degree N is 


P„ M = i +px + £ 9 ^ + ... + Sk- l }--<r- N + »* N 

N i 


(c) Show that 


E n (x) = p(p - 1) • * • (p - N)x n+1 /((1 + c) n+1 ~P(N + 1)!). 


(d) Set p = 1/2 and compute F 2 (0.5), P 4 (0.5), and F 6 (0.5), Compare with 
(U5) 1 ' 2 . 

(e) Show that if 0.0 < x < 0.5 then the approximation 


(I+x) 1/2 


2 8 16 128 256 


has the error bound l £ 5 1 < (0.5) 6 (21/1024) - 0.0003204 
(0 Show that if p = N is a positive integer, then 


Pn (x) = 1 + Nx + ~ (N 2] 1)j2 + ... + Nx n ^ + x n . 

Notice that this is the familiar binomial expansion, 

15. Find c such that | £4! < 10 -6 whenever jx - jmj! < c. 

(a) Let f{x) = cos(x) and xq = 0. 

(b) Let / (x) = sin(x) and x 0 = jt/2. 

(c) Let f(x) = e x and xo = 0. 

16. (a) Suppose that y = /(x) is an even function (i.e„ f{-x) = /(*) for all x in the 

domain of /). What can be said about Fjv(x)? 

(b) Suppose that y - f(x) is an odd function (i.e„ /(-x) a; -/(x) for all * in the 
domain of /). What can be said about /%(*)? 



199 


198 Chap, 4 Interpolation and Polynomial Approximation 


Sec. 4.2 introduction to Interpolation 


17. Let v = fix) be a polynomial of degree N. if f(x o) > Oand f{x o), ..., / ( ' v 'Uo) i 
0, show that all the real roots of / are less than xq. Hint . Expand / in a Taylor 
polynomial of degree N about xo- 

18. Let fix) = e x . Use Theorem 4.1 to find P N (;r), for N = 1, 2, 3, .. expanded 
about j :0 = 0. Show that every real root of Ps CO has multiplicity less than or equal 
to one Note. If p is a root of multiplicity M of the polynomial P(x), then p is a root 
of multiplicity M — 1 of P'(x). 

19. Finish the proof of Corollary 4,1 by writing down the expression for F^’i.x) and 
showing that 


/#’<*)> = f m (x 0 ) fort = 2, 3 . N. 


Exercises 20 and 21 form a proof of Taylor’s theorem. 

20. Let g(t) and its derivatives for k — 1, 2,..,, N + 1, be continuous on the 

interval {a, b ), which contains x 0 . Suppose that there exist two distinct points x and 
x 0 such that g(x) = 0, and gOt 0 ) = g'(*o) = - * ■ g <jV} (*o) = Pro™ that there 
exists a value c that lies between *0 and x such that g {;v+1) (c) = 0. 

Remark, Note that g(t) is a function of t, and the values x and xq are to be treated 
as constants with respect to the variable t. 

Hint. Use Rolle’s theorem (Theorem 1.5, Section 1.1) on the interval with end 
points *o x t0 find the number c i such that g'(ci) = 0. Then use Rolle’s theorem 
applied to the function g'(t) on the interval with end points .to and c\ to find the 
number a such that g ff (ci) = 0. Inductively repeat the process until the number 
cjv+i is found such thatg (JV+l) (cjv+i) = 0, 

21. Use the result of Exercise 20 and the special function 

it - x 0 ) N+[ 

git) = fit) - Pn(. 0 - EN ( x \ x _ XQ )lv+i* 


where Pn(x) is the Taylor polynomial of degree N, to prove that the error term 
E n (x) = f(x ) - Pn(x) has the form 


E N (x) = f iN+l \c) 


U-*o )* +1 

(//+!)! 


Hint. Find g (Af+i) (0 and evaluate it at t = c. 


Algorithms and Programs 

The matrix nature of MATLAB allows us to quickly evaluate functions at a large num¬ 
ber of values. If X=[-l 0 1], then sin(X) will produce [sin(-l) sin(0) sin(l>]. 
Similarly, if X=-l: 0.1: l, then Y=sin(X) will produce a matrix Y of the same dimension 
as X with the appropriate values of sine. These two row matrices can be displayed in the 


form of a table by defining the matrix D = [X ’ Y 1 ] {Note. The matrices X and Y must be 
of the same length,) 

1, (a) Use the plot command to plot sin(;t), Ps{x), Pi(x), and P 9 OO from Exercise 

1 on the same graph using the interval —1 < x < 1. 

<b) Create a table with columns that consist of sin(jc), Psix), Py{x), and Pqix) 
evaluated at 10 equally spaced values of * from the interval [-1, lj. 

2. (a) Use the plot command to plot cos{;t), P^ix), Peix), and PgCO from Exercise 

2 on the same graph using the interval — 1 < x < 1. 

(b) Create a table with columns that consist of cos(jc), P4(x ), P^ix), and PgU) 
evaluated at 19 equally spaced values of x from the interval [—1, 1J. 


it. Introduction to Interpolation 

In Section 4.1 we saw how a Taylor polynomial can be used to approximate the func¬ 
tion f{x). The information needed to construct the Taylor polynomial is the value 
of / and its derivatives at yq. A shortcoming is that the higher-order derivatives must 
be known, and often they are either not available or they are hard to compute, 

Suppose that the function y = f{x) is known at the N + 1 points (jto, yo), - - -, 
(xn, V,v), where the values x k are spread out over the interval [a, b] and satisfy 

a < xo < x\ < • • • < xn < b and y k = f(x k ). 

A polynomial P(x) of degree N will be constructed that passes through these N + 1 
points. In the construction, only the numerical values x k and y k are needed. Hence 
the higher-order derivatives are not necessary. The polynomial P(jc) can be used to 
approximate f{x) over the entire interval [a,b]. However, if the error function E{x) ~ 
fix) — P{x) is required, then we will need to know f^ N+i} (x) and a bound for its 
magnitude, that is 


M = max{|/ (JV+I) 0t)l :a<x <b}. 

Situations in statistical and scientific analysis arise where the function y = fix) 
is available only at N + 1 tabulated points {x k , y k ), and a method is needed to approx¬ 
imate /(*) at nontabulated abscissas. If there is a significant amount of error in the 
tabulated values, then the methods of curve fitting in Chapter 5 should be considered. 
On the other hand, if the points {x k , yk ) are known to a high degree of accuracy, then 
the polynomial curve y = P{x) that passes through them can be considered. When 
Xtf x < Jt/v, the approximation P{x) is called an interpolated value. If either 
x < to or x# < Jt, then P{x) is called an extrapolated value. Polynomials are used to 
design software algorithms to approximate functions, for numerical differentiation, for 
numerical integration, and for making computer-drawn curves that must pass through 
specified points. 






2UU (JHAP, 4 INTERPOLATION AND POLYNOMIAL APPROXIMATION 



1 2 3 4 5 6 


Figure 4.7 (a) The approximating 
polynomial F(x) can be used for inter¬ 
polation at the point (4, P (4)) and ex¬ 
trapolation at the point (5.5, F(5.5)). 


The tangent line 
y has slope P'(4). 



1 2 3 4 5 6 


Figure 4.7 (b) The approximating 
polynomial P{x ) is differentiated and 
P\x) is used to find the slope at the in¬ 
terpolation point (4, P(4)). 


Let us briefly mention how to evaluate the polynomial P(x): 

(1) P(x) = q n x n +a N ^.\x N ~ l H- V a 2 x 2 + a\x + oq . 

Homer’s method of synthetic division is an efficient way to evaluate P(x). The deriva¬ 
tive P f (x) is 

(2) P'(x) = Nawx"- 1 +(N- l)a N ^x N ~ 2 + ... + 2a 2 x+at 
and the indefinite integral /(jt) = f P(x) dx, which satisfies I'(x) = P(z), is 


Hx) = 


a^ix 1 

N 


+ aox+C, 


where C is the constant of integration. Algorithm 4.1 (end of Section 4.2) shows hov. 
to adapt Homer’s method to P'(x) and /(x). 


Example 4.4. The polynomial P(jc) = — 0.02x 3 + 0.2x 2 — OAx + L.28 passes through 
the four points (1, 1.06), (2. 1.12), (3, 1.34), and (5, 1.7S). Find (a) P(4), (b) P'(4). 
(c) /j* P(x)dx, and (d) P(5.5). Finally, (e) show how to find the coefficients of P(x). 

Use Algorithm 4.1 (i)—(iii) (this is equivalent to the process in Table 1.2) with x — 4. 

(a) b 3 — — —0.02 

b 2 = a 2 + b 3 x = 0.2 + (-0.02) (4) = 0.12 
b t = a\ + b 2 x = -0.4 + (0.12)(4) = 0.08 
b 0 = a 0 + b l x = 1.28+ (0.08) (4) = 1.60. 
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Figure 4.8 The approximating: 
polynomial P(x) is integrated and 
its antiderivative is used to find the 
area under the curve for 1 < x < 4. 

The interpolated value is P(4) = 1.60 (see Figure 4.7(a)). 

(b) d 2 = 3 a 3 = -0.06 

di = 2a 2 + d 2 x = 0.4 + (-0.06X4) = 0.16 
d 0 = at 4-dix = -0.4+ (0.16)(4) = 0.24. 

The numerical derivative is F'(4) = 0.24 (see Figure 4.7(b)). 

(c) U = — = -0.005 

4 

i 3 = ^ + * 4 * = 0.06666667 + (-0.005)(4) = 0.04666667 
i 2 = ~+ hx = -0.2 + (0.04666667)(4) = -0.01333333 

i, = no + * 2 * ^ 1.28 + (-0.01333333X4) = 1.22666667 
io = 0 + hx = 0 + (1.22666667X4) =* 4.90666667. 

Hence 7(4) = 4.90666667. Similarly, 7(1) = U4166667. Therefore, f? P(x)dx = 
7(4) - 7 ( 1 ) = 3.765 (see Figure 4.8). 

(d) Use Algorithm 4. i(i) with x = 5.5. 

63 = 03 = — 0.02 

62 = a 2 + b 3 x = 0.2 + (—0.02)(5.5) = 0.09 
bi = ai + b 2 x = —0.4+ (0.09) (5.5) = 0.095 
b 0 = no + b y x = 1.28 + (0.095)(5.5) = 1,8025. 

The extrapolated value is P(5.5) = 1,8025 (see Figure 4.7(a)). 
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-0.000041j- 

Figure 4.10 The graph of the error >' = E(x) = 
ln(l + *)-/>(*). 


Remark . At a node** we have /(**) = P(x k ). Hence E(x k ) = 0 at a node. The graph c 
B-(x) = /(*) — P(x) looks like a vibrating string, with the nodes being the abscissa wher 
there is no displacement. m 


Algorithm 4.1 (Polynomial Calculus). To evaluate the polynomial P(x), its 
derivative P\x) t and its integral / P(x) dx by performing synthetic division. 


INPUT N 

INPUT A{0), A(l), ... t A(N) 
INPUT C 
INPUT X 


(Degree of P(*)} 
(Coefficients of P(x)} 
(Constant of integration} 
(Independent variable} 


(i) Algorithm to Evaluate T(x) 

B(N) := 4 (AT) 

FOR K = N - 1 DOWNTO 0 DO 
B(K):=*A(K) + B(K+l)*X 
PRINT "The value P(x) is”, g(0) _ 

(ii) Algorithm to Evaluate P* (x) 

D(N - 1 ) := N* A(N) 

FOR K = A - 1 DOWNTO 1 DO 

D(K-\):=K* A(K) + D(K) * X 
PRINT “The value P’(x) is**, D( 0) _ 

(iii) Algorithm to Evaluate /(*) 

I(X + [);= A(A0/(\ + l) 

FOR K = N DOWNTO 1 DO 

I(K) — A (K - l )/K A-l(K + l) * X 
1(0) :=C + /( l)*X 
PRINT “The value l f.r) is”, 1(0) 


Space-saving version 1 . 

Poly := A(N) 

FOR K = N — 1 DOWNTO 0 DO 
Poly := A(K) + Poly * X 
PRINT “The value P(x) U", Poly 

Space-saving version: 

Deriv := N * A (A) 

FOR K = N — 1 DOWNTO 1 DO 
Deriv := K * A(K) + Deriv * X 
PRINT ‘The value P'(x) is”, Deriv 

Space-saving version: 

Integ \=A(N)/(N+1) 

FOR K = N DOWNTO 1 IX) 

Integ .= A(K - \){K + Integ * X 
Integ := C 4- Integ * X 
PRINT ‘The value / (*) is”, Integ_ 
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Exercises for Introduction to Interpolation 


1. Consider P(x) =s -0.02* 3 + O.Ije 2 — 0.2* + 1.66, which passes through the four 
points (1,1.54), (2, 1.5), (3, 1.42), and (5,0.66). 

(a) Find P(4). 

(b) Find P'(4), 

(c) Find the definite integral of P(x) taken over [ 1,4J. 

(d) Find the extrapolated value P (5.5). 

(e) Show how to find the coefficients of P (*). 

2. Consider P(x) = —0.04* 3 + 0.14* 2 — 0.16* + 2,08, which passes through the four 
points (0, 2.08), (1, 2.02), (2,2.00), and (4,1.12). 

(a) Find P(3). 

(b) Find P'(3). 

(c) Find the definite integral of P(x) taken over (0, 3J, 

(d) Find the extrapolated value / > (4,5). 

(e) Show how to find the coefficients of P (x). 

3. Consider P(x) = -0.0292166667* 3 + 0.275* 2 -0.570833333* - 1,375, which 
passes through the four points {1, 1.05), (2, 1.10), (3, 1.35), and (5, 1.75). 

(a) Show that the ordinates 1.05, 1,10, 1.35, and 1.75 differ from those of Exam¬ 
ple 4.4 by less than 1.8%, yet the coefficients of * 3 and * differ by more than 
42%. 

(b) Find P{4) and compare with Example 4.4. 

(c) Find j tJ, (4) and compare with Example 4.4. 

(d) Find the definite integral of P(x) taken over [1,4] and compare with Exam¬ 
ple 4.4. 

(e) Find the extrapolated value P(5 .5) and compare with Example 4.4. 

Remark, Part (a) shows that the computation of the coefficients of an interpolating 
polynomial is an ill-conditioned problem. 


Algorithms and Programs 


1. Write a program in MATLAB that will implement Algorithm 4.1. The program 

should accept the coefficients of the polynomial P(x) — aN* N -k a N-i* N ~' H-h 

azx 2 + ai* + dio as an 1 x N matrix: P = ajy-i - ■ ■ ao]. 

2. For each of the given functions, the fifth-degree polynomial P(x) passes through 

the six points (0, /(0)), (0.2,/(0.2)), (0.4, /(0.4)), (0.6,/(0.6)), (0.8,/(0.8)), 
(1, /{!)). The six coefficients of P(x) are oq, a\ . 05 , where 


P(x) = a 5 * 5 + a 4* 4 + A 3* 3 + aix 1 + a\x + ao . 



(i) Find the coefficients of P(x) by solving the 6 x 6 system of linear equations 
oq 4- a\x + d 2 X 2 + « 3 jc 3 + a 4 x 4 + a 5 x 5 = f(xj) 

□sing xj = (j — l)/5 and j = 1, 2, 3, 4, 5, 6 for the six unknowns 

(ii) Use your MATLAB program from Problem 1 to compute the interpolated va 
ues P(0.3), P(0.4), and P(0.5) and compare with /(0.3), /(0,4), and /(0.5 
respectively, 

(iii) Use your MATLAB program to compute the extrapolated values P(-0.1) an : 
P{1.1) and compare with /(-0.1) and /(l.l), respectively. 

(iv) Use your MATLAB program to find the integral of P(x) taken over [0, 1 ) 
and compare with the integral of /(x) taken over [0, 1], Plot fix) and P{x 
over [0, 1 ] on the same graph. 

(v) Make a table of values for P(x*) t f(x k ), and E{x k ) = f(x k ) - P(x k ), where 

x k = */100forfc = 0, 1,..100, 

(a) f(x) = e* 

(b) f(x) = sin(x) 

(c) /(x) = (x + 1)< J+I ) 

3. A portion of an amusement park ride is to be modeled using three polynomials. The 
first section is to be a first-degree polynomial, P.(x), that covers a horizontal dis 
tance of 100 feet, starts at a height of 110 feet, and ends at a height of 60 feet. The 
third section is to also be a first-degree polynomial, Qi(x), that covers a horizontal 
distance of 50 feet, starts at a height of 65 feet, and ends at a height of 70 feet. The 
middle section is to be a polynomial, P(x) (of smallest possible degree), that covers 
a horizontal distance of 150 feet. 

(a) Find expressions for P(x), P L (x), and Qj(x) such that P(100) = P,(100) 
PflOO) = P[( 100), P(250) = Qi (250), and P'<250) = ^(250) and the 
curvature of P(x) equals the curvature of Pj (x) at x = 100 and equals the 
curvature of Q\ (x) a tx = 250. 

(b) Plot the graphs of Pi (xj, P(x), and Qy(x) on the same coordinate system. 

(c) Use Algorithm 4,1 (iii) to find the average height of the ride over the given hori¬ 
zontal distance. 


Lagrange Approximation 

Interpolation means to estimate a missing function value by taking a weighted aver¬ 
age of known function values at neighboring points. Linear interpolation uses a line 
segment that passes through two points. The slope between (x 0 , yo) and (x\, yi) is 
m = (y i — yo)/ (*i - xo), the point-slope formula for the line y = m(x — xo) + yo 
can be rearranged as 


y = P( x ) = yo + (yi - yo)-——. 

X] -x 0 


When formula (1) is expanded, the result is a polynomial of degree < 1. Evaluation of 
P(x) at xo and xi produces yo and yi, respectively: 

P(*o) = yo + (yi - yo)(0) = yo, 

^ P(xi) = yo + (yi - yo)G) = yi- 


The French mathematician Joseph Louis Lagrange used a slightly different method to 
find this polynomial. He noticed that it could be written as 


(3) 


y 


= A(x) - yo 


X — Xi 

xo -XI 


+ yi 


X —XQ 

xi -x 0 ‘ 


Each term on the right side of (3) involves a linear factor; hence the sum is a polynomial 
of degree < 1. The quotients in (3) are denoted by 


(4) L\ o(x) = ——— and Li,i(x) = -——. 

Xo — X] X] Xo 

Computation reveals that Li,o(xo) = 1, L],o(xi) = 0, Li i (xo) = 0, and Ly , j (xi) = 1 
so that the polynomial Pi (x) in (3) also passes through the two given points: 


(5) Pi (xo) = yo + yi(0) = yo and P] (x,) = y 0 (0) + yi = yi. 


The terms Li,o(x) and Li,i(x) in (4) are called Lagrange coefficient polynomials 
based on the nodes xo and xj. Using this notation, (3) can be written in summation 
form 

l 

(6) P l( x) = ]T>L u (x). 

*=o 

Suppose that the ordinates y k are computed with the formula y* = /(x*). If Pi (x) is 
used to approximate /(x) over the interval [xo, xj j, we call the process interpolation. 
If x < xo (or Xj < x), then using Pi(x) is called extrapolation , The next example 
illustrates these concepts. 


Example 4.6. Consider the graph y = /(x) = cos(x) over [0.0, 1,2]. 

(a) Use the nodes xo = 0.0 and xi = 1.2 to construct a linear interpolation polyno¬ 
mial Pi(x). 

(b) Use the nodes x 0 — 0.2 and xi = 1,0 to construct a linear approximating polyno¬ 
mial £?i(x). 

Using (3) with the abscissas x 0 = 0.0 and xi = 1.2 and the ordinates yo = cos(O.O) = 
1.000000 and yi = cos(1.2) = 0.362358 produces 

= l.OOOOOoAJlll + 0.362358^^ 

= —0.833333(x - 1.2) + 0.301965(x - 0.0). 


(1) 
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0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 


(a) ( b) 

Figure 4.11 (a) The linear approximation of y = (x) where the nodes *o = 0.0 

and Xj =1.2 are the end points of the interval [< 3 , b], (b) The linear approximation of 
y = Q] (a) where the nodes x 0 = 0.2 and xj =1.0 lie inside the interval [a, b ]. 


When the nodes x 0 = 0.2 and x\ = 1.0 with yo = cos(0.2) = 0.980067 and v, = 
cos( 1.0) = 0.540302 are used, the result is 

fll w = 0 - 980067 ^ + 0 - 540302 r 5 ^l 

= —1.225083 (x - 1.0) +0.675378(x - 0.2). 

Figure 4.1 1(a) and (b) show the graph of y = cos(x) and compares it with y = P\ (x ) and 
y = Q\(x). respectively. Numerical computations are given in Table 4.6 and reveal that 
Qi(x) has less error at the points x * that satisfy 0.1 < x k < 1.1. The largest tabulated 
error, /(0.6) - P\ (0.6) = 0.144157, is reduced to /(0.6) - Q } (0.6) ^ 0.065151 by using 
Qi(x). m 

The generalization of (6) is die construction of a polynomial P N <x) of degree at 

most N that passes through the N + 1 points (xq, yo), <xi, yi),_(x^, y N ) and has 

the form 

N 

( ? ) Pn(x) = V y k LN'k(x), 

k= 0 

where L N ^ is the Lagrange coefficient polynomial based on these nodes: 

(8) -i f - *>>•■• ~ x *-'X x ~-^+.) ■ ■ + ~ 

(Xk - xo) ■ ■ * (x* - Jfjt^l)(JCi -**+]) ■ * ■ (x k - X N ) 


It is understood that the terms (x - x*) and (x t - x*) do not appear on the right side of 



equation (8). It is appropriate to introduce the product notation for (8), and we write 




n,-=o(* - x j) 

_ 

nJUu* - 


Here the notation in (9) indicates that in the numerator the product of the linear 
factors (x — xy) is to be formed, but the factor (x — x k ) is to be left out (or skipped). 
A similar construction occurs in the denominator. 

A straightforward calculation shows that, for each fixed k, the Lagrange coefficient 
polynomial La. jt(x) has the property 


(10) LA.iUj) = 1 when j — k and La,*Uj) = 0 when j ^ k. 


Then direct substitution of these values into (7) is used to show that the polynomial 
curve y = Pa CO goes through (xy, yy): 


(11) Pn(xj) = yoLu.oixj) H - 1- yjLfij(Xj) H-+ yfi/L NiN (Xj) 

= yo(0) 4- h yy(l) H- h y*( 0) = yy. 


To show that Pa (x) is unique, we invoke the fundamental theorem of algebra, 
which states that a polynomial T(x) of degree < N has at most N roots. In other 
words, if T (x) is zero at N + 1 distinct abscissas, it is identically zero. Suppose that 
Pa CO is not unique and that there exists another polynomial Q,v(x) of degree < N 
that also passes through the N + 1 points. Form the difference polynomial T(x) -- 
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Figure 4.12 (a) The quadratic approximation polynomial y ~ jc) based on the 
nodes to = 0.0, x\ — 0.6, and x 2 = 1.2. (b) The cubic approximation polynomial 
y = P 3 (x) based on the nodes xo = 0.0, xj = 0.4, X 2 = 0.S, and *3 = 1 . 2 . 


Pn (a) — Qn (x). Observe that the polynomial T (jc) has degree < N and that T (xj) ~ 
Pw(xj) — Qw(xj) = }'j — = o, for j = 0, 1,..., N. Therefore, T(x) s 0 and it 

follows that Qn(x) = /VC*). 

When (7) is expanded, the result is similar to (3). The LagTange quadratic interpo¬ 
lating polynomial through the three points (x 0 , Yo)» C*i, yi)> and (x 2 , yi) is 


( 12 ) 


Pi(.x) = yo 


(x -X\){X-X 2 ) 
(xn — xi it.Cf! — tUi 


+ yi 


(jt - x 0 )(x - T 2 ) 
(xi — — jt-j) 


+ Y: 


(x -xq)(t - JCl) 
(x? — xnMxi — x\ 1 


i ne Lagrange cubic interpolating polynomial through the four points (to, yo), (xj 7 yj 
(x 2 , Y2), and (x 3 , y 3 ) is 


(13) 


P 3 (x) = yot^ 
(to 


*i)(* 

tl)(to 


*2)C* ~ 

T2)(T0 


* 3 ) | (t -tq)(t ~t 2 )(t -t 3 ) 

-T 3 ) ^(X| -T 0 )(T! -T 2 )(Ti -T 3 ) 


+ ^2 


(T — Tq)(t T] )(t T 3 ) 
(*2 - Tq)(T2 - T! )(T 2 - T 3 ) 


+ ?3 


(T -tq)(t-Ti)(t ~t 2 ) 

fo -To)(T 3 -T])(T 3 -T2)' 


Example 4,7. Consider y = f(x) = cos(t) over [0.0, 1.2], 

(a) Use the three nodes to = 0.0, Jti = 0.6, and t 2 = 1.2 to construct a quadrati 
interpolation polynomial / , 2 (t). 

(b) Use the four nodes to = 0.Q, xi = 0.4, x 2 = 0.8, and x 3 = 1.2 to construct a cubi 
interpolation polynomial /Mx). 

Using T 0 ^ 0.0, xi = 0.6, t 2 = 1.2 and y 0 = cos(O.O) = 1, y t = cos(0.6) = 0-82533C 
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and yi — cos(1.2) = 0.362358 in equation (12) produces 

= ,.0 + 0.825336 ~ ° ~ L2) 

2V J (0.0 - 0.6X0.0 - 1.2) (0,6 - 0.0)(0.6 - 1.2) 

+ 0.362358 ^-Q.qKx- 0 . 6 ) 

( 1.2 — 0 . 0 )( 1.2 — 0 . 6 ) 

= 1.388889(x - 0.6) (t - 1.2) - 2.292599(x - 0.0)(t - 1.2) 


Using to = 0.0, ti = 0.4, T 2 = 0.8, x 3 = 1.2 and yo = cos(Q.O) = 1.0, yj = cos(0.4) = 
0.921061, y 2 = cos(0.8) — 0.696707, and y 3 = cos(1.2) = 0.362358 in equation (13.1 
produces 


P 3 (T) = 1,000000 (x ~ 0^)U ~ 0.8 )(t — 1.2) 

3k (0.0— 0.4)(0.0 — 0.8)(0,0 — 1.2) 

+ 0.921061 t*-0-0>C*-<>■«)(*-1.2) 
(0.4 - 0.0)(0.4 - 0.8)(0.4 - 1.2) 

+ 0.696707 L2) 

(0.8 - 0.0) (0.8 - 0.4)(0.8 - 1.2) 

+ 0.362358 U-^-Q^-0.8) 

(1.2 — 0.0)(1.2 — 0,4)(1,2 — 0.8) 

= —2.604167(x - 0.4)(x - 0.8)(x - 1.2) 

+ 7.195789(x - 0.0)(x - 0.8)(x - 1.2) 

- 5.443021 (x - 0 . 0 ) (x - 0.4)(x - 1.2) 

+ 0.94364 l(x - 0.0) (x - 0.4) (x ~0.8). 


+ 0.921061 


-0.362358- 


The graphs of y — cos(x) and the polynomials y — P% (x) and y — Afx) are shown in 
Figure 4. 1 2(a) and (b), respectively. ■ 


Error Terms and Error Bounds 

It is important to understand the nature of the error term when the Lagrange polynomial 
is used to approximate a continuous function /(x). It is similar to the error term for 
the Taylor polynomial, except that the factor (x — xo) ,v+i is replaced with the product 
(x - xo)(x — xi) ■ ■ * (x — xn)- This is expected because interpolation is exact at each 
of the N + 1 nodes x k , where we have E N (x tt ) = /(x*) - /VC**) = yk — yk = 0 for 
* = 0,l,2,... t JV. 

Theorem 4.3 (Lagrange Polynomial Approximation), Assume that / e C jV+1 [a, b] 
and that xo, x\ ,.,., xn e [a, b] are N + 1 nodes. If x € [a, b ], then 


(14) 


fix) - Pn(x) + E n ( x ). 



iii LHAR *+ iNTEKPOLAi iUN AND rULi NUMiAL ArrRyAiMAi iUN 

where Pjvix) is a polynomial that can be used to approximate fix): 


fix) % P N (x) = f(x k )L Nik (x). 


The error term En(x) has the form 


E N (x) = 


(.i-*o)(*-i|) ■ • ■(*- j W )/ w+1> (c) 


for some value c = c(x) that lies in the interval [a, b]. 

Proof As an example of the general method, we establish (16) when N = \. The 
genera] case is discussed in the exercises. Start by defining the special function g(i ) as 
follows 


g(t) = f(t)~Pi(t)-E 1 (x) 


j (f -xo)(J - xi) 

(x-x 0 )(x-xi)’ 


Notice that x , xq and x\ are constants with respect to the variable t and that g(r) eval¬ 
uates to be zero at these three values; that is, 

= /W - A to - Eito **~ Jt °^~' t| ^ = /to ~ A to - £,w = 0. 

g(x o) - fix o) ” J°l(*o) - = = °’ 

= fixi) - Pi(x,} - EUx) \x-7o)f-Z' =f (Jr,) ‘ p ' (x,)= °- 

Suppose thatx lies in the open interval (xo, xj). Applying Rolle’s theorem to git) 
on the interval [xo, jc] produces a value do, with xq < do < x, such that 

(18) g'ida) = 0. 

A second application of Rolle’s theorem to git) on [x, xi] will produce a value d\, 
with x < d] < xi, such that 


(19) gVi)=0, 

Equations (18) and (19) show that the function g\t) is zero at t — do and t = di. 
A third use of Rolle’s theorem, but this time applied to g'(t) over [do, d\], produces a 
value c for which 


( 20 ) 


# < 2 V) = 0. 
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inow go bacK to (17) and compute the derivatives g’it) and g"{t): 

(21) ft) = fit) - f,'( o - g, to fr . ~ * o) + (f ~ - ^ r. 

(x -x 0 )(x -X]) 

(22) fit) = fit) - 0 - E, lx)- - \ 

(x-x 0 )(x -xj) 

In (22) we have used the fact the P\{t) is a polynomial of degree N = l; hence its 
second derivative is Pf(t) = 0. Evaluation of (22) at the point t — c and using (20) 
yields 


(23) 0 = no - E\(x)~ --- 

(x -x 0 )(x -X]) 

Solving (23) for E\ (x) results in the desired form (16) for the remainder: 

( 24 ) E l lx)= ( - X - XMX - X ' )f ™ (c \ 
and the proof is complete. 


The next result addresses the special case when the nodes for the Lagrange poly¬ 
nomial are equally spaced x* = x 0 + hk, for k =0, 1, ..., N t and the polynomial 
Pn(x) is used only for interpolation inside the interval [xo, x^}. 

Theorem 4.4 (Error Bounds for Lagrange Interpolation, Equally Spaced Nodes). 

Assume that fix) is defined on [a , b], which contains equally spaced nodes x* = 
x 0 + hk. Additionally, assume that f(x) and the derivatives of fix), up to the order 
A + 1 , are continuous and bounded on the special subintervals [x 0 , xj], [x 0 , x 2 ], and 
[xo, X 3 ], respectively; that is, 

(25) J/ ( ^ + 1 ) (x>! < M N+l for xo < x < x Nl 

f or N = 1,2, 3. The error terms (16) corresponding to the cases N = 1, 2, and 3 have 
the following useful bounds on their magnitude: 


:26) 


valid 

for x € [xo, xi], 

27) 


valid 

for x g [xo, x 2 ]. 

28) 


valid 

for x G [xq, X3]. 


Proof We establish (26) and leave the others for the reader. Using the change of 
variables x - xo = t and x — xi — f - ft, the error term Ei(x) can be written as 

(r 2 — ht) f^fc) 

£r (x) = Ei (xo + 0 = --^“ for 0 < t < h. 


( 29 ) 
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The bourn! for the derivative for this case is 

(30) 1/ (2) (c)|<M 2 for xo < c < *i. 

Now determine a bound for the expression (t 2 - ht) in the numerator of (29); ct 2 
this term <h(r) = t 2 — hr. Since 3>'(t) = 2 1 — h, there is one critical point t = A/ _ 
that is the solution to <£'(0 = 0. The extreme values of <l>(r) over [0, h] occur eithi: 
at an end point <£{Q) = 0, Q(k) = 0 or at the critical point 4>(A/2) = —A 2 /4, Sine, 
the latter value is the largest, we have established the bound 

| _ l2i l2 

(31) mm = \t 2 - ht\ < —— = -- for 0 < f < A. 

4 4 

Using (30) and (31) to estimate the magnitude of the product in the numerator n > 
results in 

|ij>fr)ll f (2) (ctl h 2 Mi 

(32) IEi(*)l = ' 2 , 


and formula (26) is established. 


Comparison of Accuracy and 0{h N+i ) 

The significance of Theorem 4.4 is to understand a simple relationship between tlu 
size of the error terms for linear, quadratic, and cubic interpolation. In each case th< 
error bound |£V(x)| depends on A in two ways. First, h N+{ is explicitly present st 
that \Ey(x)\ is proportional to h N ~ l . Second, the values generally depend or 

A and tend to i/' jV+ l ^(xo)l as h goes to zero. Therefore, as h goes to zero, [Ejv(x) 
converges to zero with the same rapidity that A ,v+I converges to zero. The notation 
0(h N+{ ) is used when discussing this behavior. For example, the error bound (26; 
can be expressed as 

|Ei(x)| =■ 0(h 2 ) valid for x e ftp, xjj. 

The notation 0(h 2 ) stands in place of A 2 Af 2 /8 in relation (26) and is meant to convey 
the idea that the bound for the error term is approximately a multiple of A 2 ; that is, 

|£i(x)| < CA 2 0(h 2 ). 

As a consequence, if the derivatives of / (x) are uniformly bounded on the in¬ 
terval |Al < 1, then choosing N large will make h N+i small, and the higher-degree 
approximating polynomial will have less error. 
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(a) (A) 

Figure 4,13 (a) The error function £ 2 (jc) = cos(x) - P 2 {x). (b) The error function 
£ 3 (jc) = cos(x) - P 3 (x). 


Example 4.8. Consider y = /(x) = cos(x) over [0.0,1.2]. Use formulas (26) through 
(28) and determine the error bounds for the Lagrange polynomials P\(x), P 2 (x), and P 3 (x) 
that were constructed in Examples 4.6 and 4.7. 

First, determine the bounds Mi, Mj. and Af 4 for the derivatives !/ ( 2 > (x') : j, ]/ ( 3 Hx)|, 
<md i/ i4) (x% respectively, taken over the interval f 0 . 0 , L2J: 

l/ ( 2 ) <x>l = 1 - cos(x)| < {—cos( 0 , 0 )| = 1.000000 = Afc, 
i/ ( 3 ) (x)| = i sin(x)J < j sin( 1 . 2 )i = 0.932039 = M 3 , 

|/ ( 4 ) (x)j - (cos(x)i < j cos(O.O) | = 1.000000 = 

For Pi (x) the spacing of the nodes is h = 1,2, and its error bound is 


) |gl(j)| <^<»:^~ =0 , lg0000 , 

0 8 

For Pi(x) the spacing of the nodes is h = 0 , 6 , and its error bound is 


(34) 


E 2 (x)\ 


A 3 Af 3 (Q.6) 2 (0.932039) 
9V3 ” 9V5 


0.012915 


For P 3 (x) the spacing of the nodes is k = 0.4, and its error bound is 


|£ 3 (*)| < 


(0.4) 4 ( 1.000000) 


— = 0,001067. 


From Example 4.6 we saw that fE[(0,6)| = | cos(0.6) — Pi (0.6) | — 0.144157, so 
the bound 0.180000 in (33) is reasonable. The graphs of the error functions £2 (x) = 
cos(x) — P 2 (x) and E^x) = cos(x) — P 3 (x) are shown in Figure 4.13(a) and (b), 
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Table 4.7 Comparison of f(x) = cos(jr) and the Quadratic and Cubic Polynomial 
Approximations Fz(x) Pi{x) 



respectively, and numerical computations are given in Table 4.7. Using values in the 
table, we find that = jcos(l.O) — Fi(1.0)| = 0.008416 and |£^ 3 (0.2)| --- 

| cos(0.2) - /* 3 < 0 . 2 )| = 0.000855, which is in reasonable agreement with the bour 
0.012915 and 0.001607 given in (34) and (35), respectively. 


MATLAB 

The following program finds the collocation polynomial through a given set of poii 
by constructing a vector whose entries are the coefficients of the Lagrange interpo. 
tory polynomial. The program uses the commands poly and conv. The poly co ' 
mand creates a vector whose entries are the coefficients of a polynomial with specific 
roots. The conv commands produces a vector whose entries are the coefficients o: 
polynomial that is the product of two other polynomials. 

Example 4.9. Find the product of two first-degree polynomials, P(x) and Q(x) t w 
roots 2 and 3, respectively. 

»P=poly(2) 

P= 

1 -2 

»Q*poly(3) 

Q= 

1 -3 

>>conv(P,q) 

ans= 

1-5 6 

Thus the product of P(x) and Q(x) is x 2 -5x + 6 
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Program 4.1 (Lagrange Approximation). To evaluate the Lagrange polynomial 
p ( v ) - ?*£#,*(*) based on N + 1 points (xjt, y t ) for k = 0,1 ,..., N. 

function [C.Lj-lagranCX,Y) 

J.Input - X is a vector that contains a list of abscissas 

Y is a vector that contains a list of ordinates 

’/.Output - C is a matrix that contains the coefficients of 

S/ " the Lagrange interpolatory polynomial 

’/> - L is a matrix that contains the Lagrange 

coefficient polynomials 
ength(X); 
u=* - 1.; 

L-zeros(w,w); 

/Fcrz. the Lagrange coefficient polynomials 
for k=l:n+l 
V=l; 

for j~l:n+l 

if 

V=conv(V f poly(X(j)))/(X (k) -X ( j) ) ; 
end 

end 

L(k,:)=V; 


’/.Determine the coefficienta of the Lagrange interpolating 
'/polynomial 

d-Y*L; 


Exercises for Lagrange Approximation _ 

L Find Lagrange polynomials that approximate f{x) = x 3 . 

(a) Find the linear interpolation polynomial ^(x) using the nodes xq = — 3 and 
x\ — 0. 

(h) Find the quadratic interpolation polynomial ftOO using the nodes *0 = —1, 
jet = 0, andx2 =■ 1, 

(c> Findthe cubic interpolation polynomial Fi(jt) using the nodes xq = -1, xj = 0, 
X 2 ~ 1, and xj = 2. 

(d) Find the linear interpolation polynomial P { (x) using the nodes Jto = 1 and 


(e) Find the quadratic interpolation polynomial /^(jc) using the nodes xq = 0, 
X] =: 1, and Y 2 = 2. 
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2. Let f{x) = x + 2fx, 

(a) Use quadratic Lagrange interpolation based on the nodes xo = U *1 = 2, a n. 
*2 = 2.5 to approximate /(L5) and /(1.2). 

(b) Use cubic Lagrange interpolation based on the nodes *o = 0.5, = 1, * 2 = - 

and *3 = 2.5 to approximate /(1.5) and /(L2). 

3. Let fix) = 2 sinOrx/ 6 ), where * is in radians. 

(a) Use quadratic Lagrange interpolation based on the nodes *o = 0, x\ = L a , 1 
xt = 3 to approximate /(2) and /(2,4). 

(h) Use cubic Lagrange interpolation based on the nodes *o = 0, x\ = L *2 = 
and *3 = 5 to approximate /(2) and /(2.4). 

4. Let /(*) = 2 sin( 7 T*/ 6 ), where * is in radians. 

(a) Use quadratic Lagrange interpolation based on the nodes *o = 0. = L 1 

*2 = 3 to approximate /(4) and /(3.5), 

(b) Use cubic Lagrange interpolation based on the nodes jcq = 0, xi — L *2 = ’ 
and *3 — 5 to approximate / (4) and /(3-5). 

5 . Write down the error term E$(x") for cubic Lagrange interpolation to fix), whore 
interpolation is to be exact at the four nodes xo “ — L -U = 0 , *2 = 3i *4 = ^ 
and fix) is given by 

(a) /(*) = 4 * 3 -3* +2 

(b) /(*) = * 4 -2r 5 

(c) fix) = * s - Sx* 

6 . Let /(*) = x*. 

(a) Find the quadratic Lagrange polynomial P 2 (x) using the nodes *0 = L -*i - 

1.25, and *2 = L5. 

(b) Use the polynomial from part (a) to estimate the average value of fix) over the 
interval [1, 1.5], 

(c) Use expression (27) of Theorem 4.4 to obtain a bound on the error in appro \; 
mating f{x) with Piix). 

7. Consider the Lagrange coefficient polynomials L 2 t *U) that are used for quadratic 
interpolation at the nodes * 0 . xi, and x 2 - Define g(x) = Lz.oC*) + L 2 . i(x. 

L 2 . 2 (*)-L 

(a) Show that g is a polynomial of degree < 2. 

(b) Show that g{x k ) — 0 for k = 0, 1,2. 

(c) Show that g (x) = 0 for all *. Hint. Use the fundamental theorem of algebra 

8 . Let Ljv, 0 (x), Ejv.i(x) .and Ls.s(x) be the Lagrange coefficient polynom cK 

based on the N + 1 nodes x 0 . x \,,,., and x/j . Show that X)JLo (*) = 1 for - ;| V 
real number*. 

9. Let f(x) be a polynomial of degree < N. Let P N (x) be the Lagrange polynomia. 1 
degree < N based on the N 4 - 1 nodes * 0 , U,.. -, x N . Show that fix) = Psix) 
all x. Hint. Show that the error term E N (x) is identically zero. 
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10. Consider the function fix) = sin(x) on the interval [0, ij. Use ineorem 4.4 to 
determine the step size h so that 

(a) linear Lagrange interpolation has an accuracy of 10 -6 (i.e., find h such that 
|E](x)|<5xl(T 7 ). 

(b) quadratic Lagrange interpolation has an accuracy of 1CT 6 (i.e., find h such that 

|E 2 (*)| <5x 10 7 ). 

(c) cubic Lagrange interpolation has an accuracy of 10 -6 (i.e., find h such that 
\Ei(x)\ <5 x 10" 7 ). 

1L Start with equation (16) and N = 2, and prove inequality (27). Let x\ = *0 + ft, 
*2 = *0 + 2ft. Prove that if xo < x < * 2 then 

2ft 3 

I* -X 0 ||x -*j||*-*2l < x 3l/ 2 - 

Hint. Use the substitutions f=x — xj,r+ft — x — xq, and t — h — x — x 2 and the 
function u(/) = f 3 - th 1 on the interval -ft <t < ft. Set i/(0 = 0 and solve for t in 
terms of h. 

12, Linear interpolation in two dimensions. Consider the polynomial z = P(x, y) = A + 
Bx-i-Cy that passes through the three points (xq, yo, zo), (xi, yt, ZjLand (*2> yi. zi). 
Then A, E, and C are the solution values for the linear system of equations 

A Bx 0 4- Cyo = zq 
A + Bx\ + Cyi = z i 
A + Bx 2 + Cy 2 — Z2- 


(a) Find A, B , and C so that z = P{x, y ) passes through the points (1,1,5). 
(2. 1,3), and (1,2,9). 

(b) Find A, and C so that z — P(x, y) passes through the points (1, 1,2.5), 
( 2 , 1,0), and (1,2,4). 

(c) Find A, B, and C so that z = E(x, y) passes through the points (2, 1, 5), 
(1,3, 7), and (3,2,4). 

(d) Can values A, B , and C be found so thatz = P(x, y) passes through the points 
(1,2, 5), (3, 2,7), and (1,2,0)? Why? 

13. Use Theorem 1.7, the Generalized Rolle’s Theorem, and the special function 


gii) = f(t)-F N (t)-E n (x) 


(t — xp)(f ~xn) 

(x-x 0 )(* -*!)■■-(* -x N )’ 


where Psix) is the Lagrange polynomial of degree N, to prove that the error term 
Esix ) = fix) — Psix) has the form 


Esix) = ix - x 0 )(x -xi)'"(x- x N ) ^ ■ 

(A + 1)! 


Mnt. Find and then evaluate it at t = c. 
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Algorithms and Programs _ 

1. Use Program 4.1 to find the coefficients of the interpolator polynomials in Prob¬ 
lem 2(i) a, b, and c in the Algorithms and Programs in Section 4.2. Plot the graphs 
of each function and the associated interpolator polynomial on the same coordinate 
system. 

2. The measured temperatures during a 5-hour period in a suburb of Los Angeles on 
November 8 are given in the following table. 

{a) Use Program 4.1 to construct a Lagrange interpolator polynomial for the data 
in the table. 

(b) Use Algorithm 4.1(iii) to estimate the average temperature during the given 
5-hour period. 

(c) Graph the data in the table and the polynomial from part (a) on the same coordi¬ 
nate system. Discuss the possible error that can result from using the polynomial 
in part (a) to estimate the average temperature. 



4.4 Newton Polynomials 

It is sometimes useful to find several approximating polynomials Pi (x), ft CO, 
P N (x) and then choose the one that suits our needs. If the Lagrange polynomials 
are used, there is no constructive relationship between ft/-i(x) and ft(x). Each 
polynomial has to be constructed individually, and the work required to compute the 
higher-degree polynomials involves many computations. We take a new approach and 
construct Newton polynomials that have the recursive pattern 

( 1 ) Pi(x) =ao + ai(* -*o), 

(2) ft CO = a 0 + a l U “ *o) + fl 2(* - *o)(* “ *i) ■ 
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W ft (*) —ao + ai(x— *o) + <22 Or — ri))0r — Xi) 

+ fl 3 (x -xo)(x - *i) 0 r — * 2 ), 

(4) ftCO ~ a$ + a\{x — jto) + <* 20 r — xq)(x — jq) 

+ *3 Or ~ xo)(x ~ *i)0r “ * 2 ) 

+ a A {x -x 0 )(x -xi)(x -x 2 )(x - * 3 ) H- 

+ 0jv(x - xq) ■ ■ - (x - Jfjv-l). 

Here the polynomial ft(x) is obtained from Pjv-i U) using the recursive relationship 

(5) Prefix) = P N -\(x} + a N (x - xq)(x- xi)(x - x 2 ) - ■ (x 

The polynomial (4) is said to be a Newton polynomial with N centers jc 0 , jq, 
.... xm -] It involves sums of products of linear factors up to 

a N (x - * 0 ) 0 : -xi)(x - x 2 ) • ■ ■ (x - x N -i), 

so Pn(x) will simply to be an ordinary polynomial of degree < N, 

Example 4.10. Given the centers xo = U xi = 3, x 2 = 4, and x 3 = 4.5 and the 
coefficients oq = 5,a\ = -2, a 2 — 0.5, 03 = —0.1, and oa = 0.003, find Pi(x), P 2 (jc), 
ftfr) and Pi(x) and evaluate ft(2.5) for it = 1,2, 3 , 4 , 

Using formulas (I) through (4), we have 

ft<x) = 5-2(x-I), 

ft(x) = 5 - 2 {x - 1) +0.5(x - l)(je - 3), 

ft CO = ft(x) - O.i (x — i)fx — 3)(x - 4), 

ftU) = ft(x) + 0.003 (x - l)(.v - 3)(r - 4)(x - 4.5). 

Evaluating the polynomials at x =2.5 results in 

Pt (2.5) = 5 — 2(1.5) = 2 , 

ft(2,5) — P\ (2,5) + 0.5(1,5)(—0.5) = 1.625, 

ft(2.5) = ft (2.5) - 0.1(1.5)(— 0.5)(— 1.5) = 1.5125, 

ft (2-5) = ft (2.5) + 0.003(1.5)(-0.5)(-l,5)(-2.0) = 1.50575. . 

Nested Multiplication 

If N is fixed and the polynomial ft(x) is evaluated many times, then nested multi¬ 
plication should be used. The process is similar to nested multiplication for ordinary 
polynomials, except that the centers ** must be subtracted from the independent vari¬ 
able x The nested multiplication form for P 3 (x) is 

( 6 ) P 3 (x) = ((a 3 (x -x 2 ) + a2)(x-*i)+a,)(x - x 0 ) +oq. 
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To evaluate ft ft) for a given value of a, start with the innermost grouping and form 
successively the quantities 

$3 = a 3> 

S 2 = ftft — * 2 ) + 

(7) 5i = S 2 ft-*i) + ai, 

So — Si(x — xq) + ao- 

The quantity Sn is now ft ft). 

Example 4.11. Compute ft (2.5) in Example 4.10 using nested multiplication 
Using ( 6 ), we write 

ft ft) = ((-0.1ft - 4) + 0.5)ft - 3) - 2)(x - 1) + 5. 

The values in (7) are 

S 3 — — 0-1 , 

S 2 = -o.l (2.5 -4) + 0.5 = 0.65, 

Si = 0.65(2.5 - 3) - 2 = -2.325, 

So = -2.325(2.5 - 1) + 5 = 1.5125. 

Therefore, ft(2.5) - 1.5125. ■ 


Polynomial Approximation, Nodes, and Centers 

Suppose that we want to find the coefficients ak for ail the polynomials ft ft), .. 
p N ( X ) that approximate a given function /ft). Then ft ft) will be based on the centers 
jcq, jci, .. x k and have the nodes jc 0 , ..., **+ 1 . For the polynomial ft (,r) the 
coefficients ao and a\ have a familiar meaning. In this case 

(8) ftfto) = /fto) 311(1 =/UO- 

Using (1) and (8) to solve for a 0 , we find that 

(9) /fto) = ft (Jto) = flo + ai (*o - *0) = oo. 

Hence a 0 = /ft 0 ). Next, using (1), (8), and (9), we have 

/(XI) = ftftl) =«o + fll(xi -xq) = /fto) + aifti -*o), 
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Hence a 1 is the slope of the secant line passing through the two points ft 0t /(.to)) 
and(*i, /ftj)). 

The coefficients oq and a 1 are the same for both ft (a) and ft ft). Evaluating (2) 
at the node x 2 , we find that 

(11) /ftz) = ftft 2 ) = ao + flift2 -xo) + a 2 ft2 -x 0 )ft2 -X]). 

The values for a 0 and a\ in (9) and (10) can be used in (11) to obtain 


f { X 2 ) - OQ ~ Q \ ft 2 ~Xq) 

ft2-X 0 )ft2-X]) 

//ft 2 ) ~ /ftp) /ftl) - /ft 0 ) 


( X 2 - Xi ). 


For computational purposes we prefer to write this last quantity as 

The two formulas for a 2 can be shown to be equivalent by writing the quotients 
over the common denominator ft 2 — ;q)ft 2 — *o)fti ~ * 0 )- The details are left for 
the reader. The numerator in (12) is the difference between the first-order divided 
differences. In order to proceed, we need to introduce the idea of divided differences. 

Definition 4.1 (Divided Differences). The divided differences for a function/ft) 
are defined as follows: 

/ft*] - /ft*), 

ft r, ,.„1 = /Wl/W 


f{x k -\,X k } = 


Xk-X k -! 




/ft*-3, Xk - 2 , Xk -1 , X*] = 


Xk ~ X k - 2 

f[xk-2, **-l, X*] - /ft*_3, Xk-2, X*_i] 


The recursive rule for constructing higher-order divided differences is 

<I4) fUt-j,...,x k ) = .~/Lt-y, -■ ■.*t- i1 

Xk ~ X k -j 


and is used to construct the divided differences in Table 4.8. a 

Hie coefficients a* of ft ft ) depend on the values /ft ; ), for j = 0,1,..,, jt. The 
next theorem shows that at can be computed using divided differences: 


0* = /fto>--,**]■ 
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Ml* 4.8 Divided-difference Table for y — f (jt) 


Xk 

f [ x k ] 

/[ . .] 

/[ > . ] 

/[.,.] 


xo 

*1 

XI 

*3 

x 4 

flXQ ] 
fU il 

f [* 2 ) 

fix 3] 
fix 4 l 

/Uo.xil 
fix |.X 2 ] i 

/[x 2 , x 3 ] | 

fix 3.X4I 

f[x Or X 1 ,X 2 ] 

/[xi,x 2 ,x 3 ] 
/[x 2 . X 3 , X4] 

/[x 0 ,x ] ,X 2 ,X 3 ] 
/[j] .X2.X3.X4 

fix 0 ,xi,x 2 ,x 3 ,x 4 } 


Theorem 4.5 (Newton Polynomial). Suppose that jcq, jti ,..,, xs are N +1 distinct 
numbers in [a, b]. There exists a unique polynomial Pn(x) of degree at most N with 
the property that 


f(xj) = Pn(xj) for j — 

The Newton form of this polynomial is 

(16) P N (x) = a 0 + ai (x - x 0 ) H-h a N (x - x 0 )(* - *i) • • • (x - x*-i), 

where a k = /Uo, x\, ,.., je*], for k = 0 , 1, ..., N. 

Remark. If {(x ; -, y>)}y =0 is a set of points whose abscissas are distinct, the values 
f(xj) = yj can be used to construct the unique polynomial of degree < N that passes 
through the N + 1 points. 


Corollary 4.2 (Newton Approximation). Assume that Pn(x) is the Newton poly¬ 
nomial given in Theorem 4.5 and is used to approximate the function fix), that is, 

(17) f(x) = P N tx) + E N (x). 

if f e C N+l [a, b], then for each x e [a, b] there corresponds a number c = c(x) in 
(a, b), so that the error term has the form 


(18) 


Effix) - 


(x -xq)(x-xi) • "(r -X N )f iN+1) (c) 
(N -F 1)! 


Remark. The error term E N (x) is the same as the one for Lagrange interpolation, which 
was introduced in equation (16) of Section 4.3. 

It is of interest to start with a known function / (x) that is a polynomial of degree N 
and compute its divided-difference table. In this case we know that / (JV+1) (x) = 0 
for all x , and calculation will reveal that the (N + l)st divided difference is zero. 
This will happen because the divided difference (14) is proportional to a numerical 
approximation for the yth derivative. 


Table 4.9 


Xk 

f[x t ] 

First 

divided 

difference 

Second 

divided 

difference 

Third 

divided 

difference 

Fourth 

divided 

difference 

Fifth 

divided 

difference 

xo = 1 

—3 






XI =2 

0 

3 





£ 

II 

w 

15 

15 

6 




X 3 — 4 I 

48 

33 

9 

! 1 



a 

11 

105 

57 

12 

1 

0 


*5 = 6 \ 

192 

87 

15 

1 

0 

0 


Table 4.10 Divided-Difference Table Used for Constructing the Newton Polynomials 

Pk(x) in Example 4.13 


*k 

/[**] 

/[ , ] 

/[ . . 3 

/{,,,) 


0 

d 

II 

1.0000000 





xi = 1.0 

0.5403023 

-0.4596977 




0 

c-i 

II 

H 

-0,4161468 

-0.9564491 

-0.2483757 



q 

rd 

II 

$ 

-0.9899925 

-0.5738457 

0.1913017 

0.1465592 


x 4 = 4.0 

-0.6536436 

0.3363499 

0.4550973 

0.0879318 

-0.0146568 


Example 4.12. Let f(x) = x 3 - 4x . Construct the divided-difference table based on the 
nodes xo = 1, x 3 = 2,..,, xj = 6, and find the Newton polynomial P 3 (x) based on xn xi 
X2, andx3. ’ ’ 

See Table 4.9. 

The coefficients oq = -3, a\ = 3, a 2 = 6, and a 3 = 1 of P 3 (x) appear on the 
diagonal of the divided-difference table. The centers x 0 = 1, Xi = 2, and x 2 = 3 are 
the values in the first column. Using formula (3), we write 

P 3 (x) = -3 4- 3(x - 1) + 6(x - l){x - 2) + (x - l)(x - 2)(x - 3). 

Example 4.13. Construct a divided-difference table for /(x) = cos(x) based on the five * 
points (*, cos(k)), for k = 0, I, 2, 3, 4. Use it to find the coefficients a k and the four 
Newton interpolating polynomials /^(x), for k = 1,2, 3,4. 

For simplicity we round off the values to seven decimal places, which are displayed 
in Table 4.10. The nodes x 0 , xu x 2 , x 3 and the diagonal elements a 0 , a u a 2 , a 3 , a 4 in 
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Figure 4.14 (a) Graphs of y = cos(.c) Figure 4,14 (b) Graphs of y = cos(x) 

and the linear Newton polynomial y — and the quadratic Newton polynomial 

Pi (x) based on the nodes jco = 0.0 and y = P 2 (x) based on the nodes xq = 

jf] = 3.0. 0,0, = 1.0, and x 2 = 2.0. 


Table 4.10 are used in formula (16), and we write down the first four Newton polynomials: 
Pi (jc) - 1.0000000 - 0.4596977^ - 0.0), 

P 2 ( x ) = 1,0000000 - 0.4596977(r - 0.0) - 0.2483757{* - 0.0 )(jc - 1.0), 
x ) - 1.0000000 - 0.4596977(a - 0.0) - 0.2483757(r - 0,0 )(jc - 1.0) 

+ 0.14655920c - 0.0)(* - 1,0 )(jc - 2.0), 

P 4 (r) = 1.0000000 - 0.4596977(a - 0.0) - 0.2483757U - 0.0) (jc - 1.0) 

+ 0.1465592(r - 0,0) (jc - 1.0)(or - 2.0) 

- U.0i465680c - 0.0 )(a - 1.0)(jc - 2.0)(jc - 3.0), 


The following sample calculation shows how to find the coefficient a 2 . 


/Uo ,„] , ^ 1 -/^ = 0 ^ 3023-.0000000 = _ 0 45969?7 

JC] — jcq 1.0 — 0.0 


f[x i,Jt 2 ] = 


f[xi\ - /[jci] -0.4161468 - 0.5403023 


2 . 0 - 1.0 


= -0.9564491, 


„ = f Uo x, x 2 ] = 1 * 0 , *.J = _ _ 0 24g3757 . 

a 2 - J wo. *1 ■ *2J X2-XC 2.0 - 0.0 


The graphs of y = cos(r) and y = Pi CO, y = Pz(x), and y = P 3 OO are shown in 
Figure 4.14(a), (b), and (c), respectively. 

For computational purposes the divided differences in Table 4.8 need to be stored in an 
array which is chosen to be D(k, j). Thus (15) becomes 


(19) 


0(k, j ) = f[x k -j, Xk-j+i .A*] for j < k. 
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Relation (14) is used to obtain the formula to recursively compute the entries in the array: 


D(k,j) = 


X k - Xk-J 


Notice that the value a k in (15) is the diagonal element a k = D(k t k). The algorithm for 
computing the divided differences and evaluating P N (x) is now given. We remark that 
Problem 2 in Algorithms and Programs investigates how to modify the algorithm so that 
the values fa*} are computed using a one-dimensional array. H 


Program 4.2 (Newton Interpolation Polynomial), To construct and evaluate the 
Newton polynomial of degree < N that passes through (a*, y k ) = (x k , f(x k )') for 
* = 0, 1. N: 

P( x ) “ ^0,0 + dl,l(x — Xq) -j- d 2 ,2\X — ^o)(-T — JC]) 

+- \-d N ^ N {x -JC 0 )(JC -J£t)'"U — ATjV—i), 


A• -» and d k , = 

__ Xk “ Xt-j 

function [C,D]=newpoly(X, Y) 

%Iaput - X is a vector that contains a list of abscissas 
£ - Y is a vector that contains a list of ordinates 


^Output - C is a vector that contains the coefficients 
% of the Newton intepolatory polynomial 

% - D is the divided-difference table 


h=length(X); 
t)-zeros(n,n) ; 
D(:,1)=Y’; 
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V, Use formula (20) to form the divided-difference table 
for j-2:n 

for k=j:n 

D(k,j)=(D(k,j-i)-D(k-l,j-i))/(X(k)-X(k-j+l)); 

end 

end 

'/Determine the coefficients of the Newton interpolating 

'/♦polynomial 

C-D(n,n); 

for k=(n-l):-l:1 

C=conv(C,poly(X(k))); 
m=length(C); 

C (m) =C (m) +D (k„ k); 

end 


E xercises for Newton Polynomials _ 

In Exercises 1 through4, use the centers x 0 ,xt,x 2 , and x 3 and the coefficients a 0 , «t, o 2 ,a it 
and 04 to find the Newton polynomials P\ (x), /Mx.L P:.(x), and P *(x), and evaluate them 
at the value x - c. Hint. Use equations (1) through (4) and the techniques of Example 4.9. 


1 . 


= 4 

a\ 

= -1 

ai 

— 

0.4 

at = 0,01 

0.4 

= - 0.002 


XO 

= 1 

x\ 

= 3 

xi 

= 

4 

VI 

II 

H 

c 

= 2.5 

2 , 

ao 

= 5 

a\ 

= -2 

a 2 

= 

0,5 

u 3 = - 0-1 

a 4 

= 0.003 


XO 

= 0 

xi 

= 1 

X2 

= 

2 

II 

c 

= 2.5 

3. 

00 

= 7 

a\ 

= 3 

ai 

— 

0,1 

£13 = 0,05 

04 

= -0.04 


xo 

= -1 

x s 

= 0 

xi 

= 

1 

*T 

II 

c 

= 3 

4. 

ao 

= -2 

a\ 

= 4 

ai 

= 

-0.04 

t?3 — 0.06 

a 4 

= 0.005 


xo 

= -3 

XI 

= -1 

x 2 

— 

1 

X3-4 

c 

= 2 


In Exercises 5 thorugh 8 : 

(a) Compute the divided-difference table for the tabulated function. 

(b) Write down the Newton polynomials P\ (x), Pi(x), /Mx), and P 4 O). 

(c) Evaluate the Newton polynomials in part (b) at the given values of x. 

(d) Compare the values in part (c) with the actual function value f(x). 
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/(x) 

= x 1 ' 2 


6 . 

/<*)- 

3.6/ 

X 

X 

= 4.5,7.5 


x = 

2.5, 

3.5 

k 

Xk 

f(Xk) 


k 

Xk 

f(Xk) 

0 

4.0 

2.00000 


0 

1.0 

3.60 

1 

5.0 

2.23607 


1 

2.0 

1.80 

2 

6.0 

2.44949 


2 

3,0 

1.20 

3 

7.0 

2.64575 


3 

4.0 

0.90 

4 

8.0 

2.82843 


4 

5.0 

0.72 

/(x) 

= 3sin 2 (7rx/6) 

8. 

/0) = 



X 

= 1.5, 

3.5 


X — 

0.5, 

1.5 

k 

Xk 

/Oft) 


k 

x* 

/Oft) 

0 

0.0 

0.00 


0 

0.0 

1.00000 

1 

1,0 

0.75 


1 

1.0 

0.36788 

2 

2.0 

2.25 


2 

2.0 

0,13534 

3 

3.0 

3.00 


3 

3.0 

0.04979 

4 

4.0 

2.25 


4 

4.0 

0.01832 


9. Consider the M + 1 points (xo, yoU ■ • ■ - (xm, y*r). 

(a) If the (N + l)st divided differences are zero, then show that the (N 4 - 2)nd up 
to the A/th divided differences are zero. 

(b) If the (N + I) st divided differences are zero, then show that there exists a poly¬ 
nomial Pf/(x) of degree N such that 

Ps(xk) = yk for k = 0, 1, .. ■, M. 

In Exercises 10 through 12, use the result of Exercise 9 to find the polynomial Ps(x) that 
goes through the M + 1 points (N < M ), 

10 ,-- 11 , -- 12 ,- 

x k y k x k x t y k 
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13. Use Corollary 4.2 to find a bound on the maximum error (|f 2 (*)|) on the inter¬ 
val [0, tt], when the Newton interpolator polynomial P 2 (a) is used to approximate 
f(x) = cos(jta) at the centers xq = 0, a ; = nf 2, and xj — tz. 
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fee. 4.5 Chebyshev Polynomials 

Table 4.11 Chebyshev Polynomials 
To (a) through 7? (a) 

"^o(y) = 1 
ri(j:) = x 
72 ( x ) — lx 1 - 1 

r 3 (jt) = 4 x 3 - 3 x 
T 4 (x) = 8a 4 - 8a 2 + l 

75(xJ = 16Y 5 -20r 3 +5je 
T 6 (x) = 32x 6 ~48x 4 + lgx 2 ~l 
T 7 (x) = 64x ? -1 12x 5 4- 56a 3 - lx 


our task is to follow Chebyshev’s derivation on how to select the set of nodes [**}£,. 
that minimizes max_ ]<*<!{! £(*)(}. This leads us to a discussion of Chebyshev poly¬ 
nomials and some of their properties. To begin, the first eight Chebyshev polynomials 
are listed in Table 4.11. 

Properties of Chebyshev Polynomials 

Property 1. Recurrence relation 

Chchyshev polynomials can be generated in the following way. Set 7b(x) = 1 and 
7i u 1 = * and use the recurrence relation 

(3) T k {x) = 2xT k -i(x) - 7*_ 2 (a) for k = 2, 3. 

Property 2. Leading Coefficient 

The coefficient of a* in T N (x) is 2 N ~ l when N > 1. 

Property 3. Symmetry 

When N = 2M, T 2 m(x) is an even function, that is, 

^ Tim(-x) = T 2 m(x). 

When N = 2M + 1, T 2 M +1 (x) is an odd function, that is, 

^ Tim+\ i-x ) = -T 2 M+ 1 U). 

Property 4. Trigonometric Representation on [-1,1] 

C6) Th(x) = cosfA^arccostA)) for - 1 < x < 1. 






y^T 2 (x) \ y = t 3 (x) 


Figure 4.15 Graphs of the Cht 
shev polynomials ToOc), 7) (x), 

..., r 4 (Jt) over [—1, 1], 


Property 5. Distinct Zeros in [-1,1] 

Tu(x) has N distinct zeros xtr that lie in the interval [—1, 1] (see Figure 4.15): 

(7) xk - cos ^ for * = 0tl .AT-1. 

These values are called the Chebyshev abscissas (nodes). 


Property 6. Extreme Values 


ITVOt)! < 1 for — 1 < jc < 1. 


Let us show that Tj (jc) = 2jc72(jc) — 7) (jc). Using the expressions for 7) (jc) and 72(.v) 
in Table 4J1, we obtain 

2x72(jc) — T\ (x) = 2 jc(2jc 2 — 1) — jc = 4x 3 — 3 jc = TbOc). 

Property 2 is proved by observing that the recurrence relation doubles the leading 
coefficient of T N _ t (x) to get the leading coefficient of 7,v(jc). 


and T 2 m+\ (x) involves only odd powers of x. The details are left for the reader. 
The proof of property 4 uses the trigonometric identity 

cos(£0) — cos(20) cos((£ - 2)0) - sin(20) sin((& — 2)0) 


Substitute cos(20) = 2cos 2 (0) - 1 and sin(20) = 2 sin(0) cos(0) and get 
cos(*0) = 2cos(0){cos(0) cos((Jfe - 2)0) - sin(0) sin((it - 2)0)) - cos((* - 2)0) 
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which is simplified as 

cos(Jt0) = 2cos(0)cos((Jt — 1)0) — cos((A — 2)0). 

Finally, substitute 0 = arccos(jc) and obtain 

(9) 2jt cos((* — 1) arccos(x)) - cos{(fc - 2) arccos{;c)) 

- co$(k arccos(x )) for — 1 < .t < l. 

The first two Chebyshev polynomials are 7b(jr) = cos(Oarccos(:c» = 1 and 
Fi(x) ~ cos(l arccos(x)) = x. Now assume that 7 *{jc) = cos(fcarcco$(x)) for it = 2, 
3, — N — I, Formula (3) is used with (9) to establish the general case: 

T'n(x) = 2xTf/-i(x) - Tn- 2 < jt ) 

= 2.x cost (N - I) arccos(.r)) — cos((jV — 2) arccos(.r); 

= cos(V arccos(x)) for — 1 < x < 1. 

Properties 5 and 6 are consequences of Property 4. 

Minimax 

The Russian mathematician Chebyshev studied how to minimize the upper bound for 
\£ n ( x ) |. One upper bound can be formed by taking the product of the maximum value 
of |£?(jc)| over ail x in [-1, 1] and the maximum value |/ (jv+1j (j:)/CW + 1 )!| over 
ail x in [-1, 1]. To minimize the factor max{|(>(*)|}, Chebyshev discovered that xq, 
Xi, . .., xjv should be chosen so that Q(x ) — (1/2 N )T N+ \(x). 

Theorem 4.6. Assume that N is fixed. Among all possible choices for Q(x) in equa¬ 
tion (2), and thus among all possible choices for the distinct nodes {x*}£L 0 in [-1, U, 
fte polynomial T(x) = T N+] (x)/2 N is the unique choice that has the property 

max i\T(x)\) < max IJG(x)!}. 

-1<x<1 -1<JC<1 

Moreover, 

(10) max {|T(x)|} = ^ 

Proof. The proof can be found in Reference [29], • 
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Table 4. 12 Lagrange Coefficient Polynomials Used to Form P 3 (a) 
Based on Equally Spaced Nodes a* = -t + 2Jfc/3 


L3,o(*) = -0,06250000 + 0.06250000a + 0,56250000a 2 - 0.56250000a 3 
L 3 . j (a) = 0.56250000 - 1.68750000a - 0.5625000Gx 2 + 1.68750000a 3 
L 3r2 (A) = 0.56250000 + 1.68750000a - 0.56250000a 2 - 1.68750000a 3 
L 3i3 (a) = -0.06250000 - 0.06250000a + 0.56250000a 2 + 0.56250000a 3 


Table 4.13 Coefficient Polynomials Used to Form P 3 (*) Based on the 
Chebyshev Nodes a* = cos((7 — 2k)jt/S) 

Co (a) = -0.10355339 + 0,11208538a + 0.70710678a 2 - 0.76536686a 3 
C t (x) = 0.60355339 - 1.57716102a - 0.70710678a 2 + 1.84775906a 3 
C 2 (a) - 0.60355339 + 1.57716102a - 0.70710678a 2 - 1.84775906a 3 
C 3 (a) = -0.10355339 - 0,11208538a + 0.70710678a 2 + 0.76536686a 3 


The consequence of this result can be stated by saying that, for Lagrange interpo¬ 
lation f(x) = Pn(x) + E n (x) on [-1, I], the minimum value of the error bound 

(max{JG(A)|})(max{|/ {A/+ 1 >(A}/(^ + 1)!/}) 

is achieved when the nodes {a*} are the Chebyshev abscissas of 7/v+i(x). As an il¬ 
lustration, we look at the Lagrange coefficient polynomials that are used in forming 
Pi(x). First we use equally spaced nodes and then the Chebyshev nodes. Recall that 
the Lagrange polynomial of degree N = 3 has the form 

(11) P$(x) = f(x 0 )Li i0 (x) + /(Ai)Z, 3 ,i(a) + /(a 2 )L 3i2 (a) + f(.X 3 )L 3 , 3 (x). 

Equally Spaced Nodes 

If /(a) is approximated by a polynomial of degree at most N = 3 on [-1, 1], the 
equally spaced nodes a 0 = -1, aj = -1/3, a 2 = 1/3, and a 3 = 1 are easy to 
use for calculations. Substitution of these values into formula ( 8 ) of Section 4.3 and 
simplifying will produce the coefficient polynomials L ijk (x) in Table 4.12. 

Chebyshev Nodes 

When f(x) is to be approximated by a polynomial of degree at most N = 3, using 
the Chebyshev nodes a 0 = cos(7tt/8), ai = cos(5;r/8), a 2 = cos(3rr/8), and a 3 = 
cos(jt/ 8 ), the coefficient polynomials are tedious to find (but this can be done by a 
computer). The results after simplification are shown in Table 4.13. 

Example 4.14, Compare the Lagrange polynomials of degree At = 3 for /(a) = e x that 
are obtained by using the coefficient polynomials in Tables 4.12 and 4.13, respectively. 
Using equally spaced nodes, we get the polynomial 

P(x) = 0.99519577 + 0.99904923a + 0.54788486a 2 + 0. 17615196a 3 . 

This is obtained by finding the function values 

fixo) = e { ~ {) = 0.36787944, / (aj) = = 0.71653131, 

/(a 2 ) = e {lJ3) - 1.39561243, /(a 3 ) = e (l) = 2.71828183, 


and using the coefficient polynomials L 3ifc ( a) in Table 4.12, and forming the linear combi¬ 
nation 

P(a) = 0.36787944L 3 ,o(a) +0.71653131L 3 ,i(a) + 1.39561 243L 3i2 (a) 

+ 2.71828183L 3i3 (a). 

Similarly, when the Chebyshev nodes are used, we obtain 

V(a) = 0.99461532 + 0.99893323a + 0.54290072a 2 + 0.175 17569a 3 . 

Notice that the coefficients are different from those of P (a ) . This is a consequence of using 
different nodes and function values: 

/(a 0 ) = e -0.92387953 = 0.39697597, 

/(jq) ^ £-0.38268343 = q.68202877, 
f ( X2 ) = *0-38268343 = 1.46621380, 

/(a 3 ) = * 0 - 92387953 = 2.51904417. 

Then the alternative set of coefficient polynomials Cfc(x) in Table 4.13 is used to form the 
linear combination 

V (a) = 0.39697597C 0 (a) + 0.68202877Ci(a) + 1.46621380C 2 (a) + 2.51904417C 3 (a). 

For a comparison of the accuracy of P{x) and V(a), the error functions are graphed 
in Figure 4.16(a) and (b), respectively. The maximum error \e* - P(x)| occurs at a = 
0.75490129, and 

\e x - Pix)\ < 0.00998481 for -1 < a < 1. 

The maximum error |e* - V(x)| occurs al a = 1, and we get 

\<? - V (a) | < 0.00665687 for -1 < a < 1. 

Notice that the maximum error in V(a) is about two-thirds the maximum error in P(x). 
Also, the error is spread out more evenly over the interval. ■ 
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y y 



(a) (b) 

Figure4.16 (a) The error function y = e x - P(x) for Lagrange approximation over [- 3, 1] 
(b) The error function y = e* — V{x) for Lagrange approximation over [—1, 1]. 


Runge Phenomenon 

We now look deeper to see the advantage of using the Chebyshev interpolation nodes. 
Consider Lagrange interpolating to f{x) over the interval [—1, 1] based on equally 
spaced nodes. Does the error E^(x) = f(x) — Pn(x) tend to zero as N increases? For 
functions like sin(x) or e x , where all the derivatives are bounded by the same constant 
M, the answer is yes. In general, the answer to this question is no, and it is easy to find 
functions for which the sequence {FV(*)} does not converge. If /(x) = 1/(1 + 12x 2 ), 
the maximum of the error term E^(x) grows when N oo. This nonconvergence 
is called the Runge phenomenon (see Reference [90], pp. 275-278). The Lagrange 
polynomial of degree 10 based on 11 equally spaced nodes for this function is shown 
in Figure 4.17(a). Wild oscillations occur near the end of the interval. If the number of 
nodes is increased, then the oscillations become larger. This problem occurs because 
the nodes are equally spaced! 

If the Chebyshev nodes are used to construct an interpolating polynomial of de¬ 
gree 10to/(jr)= 1/(1 + 12x 2 ), the error is much smaller, as seen in Figure 14.17(b). 
Under the condition that Chebyshev nodes be used, the error E/^{x) will go to zero 
as N oo. In general, if f(x) and f(x ) are continuous on [—1, I], then it can be 
proved that Chebyshev interpolation will produce a sequence of polynomials {P/vUii 
that converges uniformly to f(x) over [—1, 1]. 


Transforming the Interval 

Sometimes it is necessary to take a problem stated on an interval [a, b] and reformu¬ 
late the problem on the interval [c, d] where the solution is known. If the approxima¬ 
tion Pfij(x) to f{x) is to be obtained on the interval [a, b], then we change the variable 
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ftv(x) is the Lagrange polynomial that is based on the Chebyshev nodes given in (14). 
If/eC^+VHthen 

2(b — ai /v+1 

U5) I/O) - iv wi < 4 „ + r^/ +1)! ,5« WIJ. 

Example 4.15, For f(x) = sin(x ) on fO, n/4], find the Chebyshev nodes and the error 
bound (15) for the Lagrange polynomial ft Or). 

Formulas (12) and (13) are used to find the nodes; 

/(ll -2Jt)?r\ n 7T , , „ t 

* = —12— + ¥ for * = 0 - 1 . 5 - 

Using the bound i/ (6) (jt)| < |—sin(jr/4)| = 2 _ly ' 2 = M in (15), we get 

I fix) - P N (x)\ < (f ) 6 (|) 2 " V2 ^ 0.00000720. m 

Orthogonal Property 

In Example 4.14, the Chebyshev nodes were used to find the Lagrange interpolating 
polynomial. In general, this implies that the Chebyshev polynomial of degree N can be 
obtained by Lagrange interpolation based on the N + 1 nodes that are the N + 1 zeros 
of 7 a^j (.x ). However, a direct approach to finding the approximation polynomial is 
to express P^{x) as a linear combination of the polynomials ft(x), which were given 
in Table 4.11 Therefore, the Chebyshev interpolating polynomial can be written in the 
form 

N 

(16) P N {x) = Yi c kTk(x) = c 0 7bU) + c\ ft (x) + - - + c N T N (x). 
k=o 

The coefficients {c fc | in (16) are easy to find. The technical proof requires the use 
of the following orthogonality properties. Let 

.. _ 2*+l\ , , „ . 


X* — COS | 7 T 


2N+ 2 


for k = 0, 1, ... s N; 


£ Ti(xt)Tj(Xk) = 0 when i ^ j. 


J2Tdx k )Tj(x t ) 


when i = j 0, 


£7b(x*)7b(x A ) = W+l. 


Property 4 and the identities (18) and (20) can be used to prove the following 
theorem. 
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Example 4.16. Find the Chebyshev polynomial ft(x) that approximates the function 
f(x) = e* over[-l, 1]. 

The coefficients are calculated using formulas (22) and (23), and the nodes x* = 
cos(ir(2it + l)/8) for k = 0, 1, 2,3. 

1 3 t .3 

c » = 7 xy* IM»> - T E *" = 1-28606568, 

4 k =0 4 i=0 

1 3 i 3 

C| = 2 y^e«ri(j:*) = ^ £e«.r* = 1.13031500, 

1 1=0 1 t=e 

Q = ^ Xy*J2C*t) = \ cos ( 2 * ^TT^) =0-27145036, 

c 3 = i £V*ft(x*) = \ cos = 0.04379392. 

2 *=0 2 *=*> ^ 8 ' 

Therefore, the Chebyshev polynomial ft(x) for e x is 

(24) ft(x) = 1.26606568ToCr) 4- 1.13031500ft(x) 

+ 0.27145036^00 + 0.04379392ft<Jt). 

If the Chebyshev polynomial (24) is expanded in powers of x, the result is 
ft(x) = 0.99461532 + 0.99893324x + 0.54290072X 2 +0.17517568x 3 , 


which is the same as the polynomial V(x) in Example 4.14. If the goal is to find the 
Chebyshev polynomial, formulas (22) and (23) are preferred. ■ 
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MATLAB 

The following program uses the eval command instead of the f eval command use 
in earlier programs. The eval command interprets a MATLAB text string as an ex 
pression or statement. For example, the following commands will quickly evaluat. 
cosine at the values x = kj 10 for k = 0, 1,..., 5: 

» x=0:.1:.5; 

>> eval{’cos(x)’) 
ans = 

1,0000 0.9950 0.9801 0.9553 0.9211 0.8776 


Program 4.3 (Chebyshev Approximation). To construct and evaluate the Cheby¬ 
shev interpolating polynomial of degree AT over the interval [—1, 1], where 

N 

fW = £f/r;« 

1=0 


is based on the nodes 



function [C,X,Y]=cheby(fun,n,a s b) 

XInput - fun is the string function to be approximated 
X - N is the degree of the Chebyshev interpolating 

X polynomial 

X - a is the left end point 

X - b is the right end point 

XOutput - C is the coefficient list for the polynomial 
X - X contains the abscissas 

% - Y contains the ordinates 

if nargin==2, a=~l;b=l;end 
d=pi/(2*n+2); 

C=zeros(l,n+l); 
for k-1rn+1 

X(k)=ces C(2*k-l)*d); 
end 

X=(b-a)*X/2+(a+b)/2; 

x=X; 

Y*eval(fun); 

for k *l:n+i 
z=(2*k-l)*d; 
for j=l:n+l 


C(j)=C(j)+Y(k)*cos((j-l)*z); 

end 

end 

C*2*C/(n+l); 

C(l)=C(l)/2; 


Exercises for Chebyshev Polynomials (Optional) 


1. Use property l and 

(a) construct from Tjte) and Ti{x). 

(b) construct 7 5 (t ) from T\ (x) and T 3 (x). 

2. Use property 1 and 

(a I construct ^(jc) from 7?j(jr.) and Tltfjc). 

(b) consiruci 77 (jc) frum T (,(x I and T $(jc ). 

3. Use mathematical induction to prove property 2. 

4. Use mathematical induction to prove property 3. 

5. Find the maximum and minimum values of T^ix). 

6 . Find the maximum and minimum values of 73 (jr), 

Hint. 7^ 1/2) = 0 and T^'f—1/2) = 0. 

7. Find the maximum and minimum values of 'Atx). 

Him. T.'tOi = 0, r 4 '(2- !/J ) =0. and T^-2~ h2 ) _ 0 . 

8 . Let f[x) — sin(x) on (— 1, R 

(a) Use Lie coefficient polynomials in Table 4.13 to obtain the Lagrange-Chebyshev 
polynomial approximation P 3 (x). 

(bj Find the error bound for | sin(r)- 

Let fix) = ln(x + 2) on f— 1. L. 

(a) Use the coefficient polynomials in Table 4.13 to obtain the Lagrange-Chebyshev 
polynomial approximation 

(b) Find the error bound for | ln(x + 2) - Pj (x).. 

40. The Lagrange polynomial of degree 77 = 2 has ihe form 

fix) = /(.ro)f-2,c(Jt) 4 - /Uj)Ly.i(jr) * fix 2)^2.2(x)- 
If the Chebyshev nodes xp = eos<57r/b'), x\ = i>, and *2 = cosItt/ 6) are used, show 
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tnat me coemcient polynomials are 

r , , x 2x 2 
^2,oU) = — 7=+~T’ 

V3 3 

4x 2 

L 2i i(x) = I - —, 
x 2x 2 

il2U) = _ + _ 

11. Let f(x) = cos(x) on [-1,1]. 

(a) Use the coefficient polynomials in Exercise 10 to get the Lagrange-Che by she v 
polynomial approximation P 2 (x). 

(b) Find the error bound for | cos(x) - FjOOl- 

12. Let f(x) = e* on [—1, I], 

(a) Use the coefficient polynomials in Exercise 10 to get the Lagrange-Chebysho 
polynomial approximation Pi(x). 

(b) Find the error bound for [e* ™ P 2 (*) |. 

In Exercises 13 through 15, compare the Taylor polynomial and the Lagrange-Cheby she v 
approximates to /(*) on [—1, 1]. Find their error bounds. 

13. f(x ) = sin(x) and N = 7; the Lagrange-Chebyshev polynomial is 

sin(x) « 0.99999998* - 0.16666599* 2 + 0.00832995* 5 - 0.0001 9291x 1 . 

14. /(*) = cos(jc) and N = 6; the Lagrange-Chebyshev polynomial is 

cos(*) =3 1 - 0.49999734* 2 -f 0.04164535* 4 - 0.00134608* 6 . 

15. /(*) = e" and N - 7; the Lagrange-Chebyshev polynomial is 

e x as 0.99999980 + 0.99999998* + 0.50000634* 2 

+ 0.16666737* 3 + 0.04163504* 4 + 0.00832984* 5 
+ 0.00143925* 6 + 0.00020399* 7 . 


16 . Prove equation (18). 

17. Prove equation (19). 


Algorithms and Programs 

In Problems 1 through 6, use Program 4.3 to compute the coefficients fc*] for the Cheb) - 
shev polynomial approximation P,v (*) to /(*) over [-1 , 1), when (a) N = 4, (b) N — 5. 

(c) N — 6, and (d) N = 1. In each case, plot /(*) and /%(*) on the same coordinate 
system. 

1. /(*) = e* 


2. /(*) = sin(*) 
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3. /(*) = cos(*) 4. f(x) = lp(* + 2) 

5. f(x) = (x + 2?' 2 6. /(*) = {* + 2)<* +2 > 

7. Use Program 4.3 (N = 5) to obtain an approximation for /J cos(* 2 ) dx. 

4,6 Pade Approximations 

In this section we introduce the notion of rational approximations for functions. The 
function /(*) will be approximated over a small portion of its domain. For example, 
if /(*) = cos(x), it is sufficient to have a formula to generate approximations on the 
interval [0, nj 2]. Then trigonometric identities can be used to compute cos(x) for any 
value * that lies outside [0, n/2]. 

A rational approximation to /(*) on [a,h] is the quotient of two polynomials 
Pn(x) and Qm(x) of degrees N and M, respectively. We use the notation (*) to 
denote this quotient: 

(1) Rn m(x) = for a < * < b. 

0 M C*) ~ ~ 

Our goal is to make the maximum error as small as possible. For a given amount 
of computational effort, one can usually construct a rational approximation that has a 
smaller overall error on [a, b ] than a polynomial approximation. Our development is 
an introduction and will be limited to Pade approximations. 

The method of Pad4 requires that /(*) and its derivative be continuous at x = 0. 
There are two reasons for the arbitrary choice of* = 0. First, it makes the manipula¬ 
tions simpler. Second, a change of variable can be used to shift the calculations over to 
an interval that contains zero. The polynomials used in (I) are 


(2) 

and 

^JvU) = P0 4 - P\x + P 2 X 2 + - 

' ' + pr* n 

( 3 ) 

Qm(x) = 1 4- q 2 x 2 + -. 



The polynomials in (2) and (3) are constructed so that /(*) and R N M (*) agree at 
* = 0 and their derivatives up to N + Af agree at * = 0. In the case Qo(x) = I, the 
approximation is just the Maclaurin expansion for /(*), For a fixed value of A r + M 
the error is smallest when /%(*) and Qm(x) have the same degree or when Ps (*) has 
degree one higher than (*). 

Notice that the constant coefficient of Qm is qo = 1 ■ This is permissible, because 
it cannot be 0 and R NM (*) is not changed when, both P N (x) and Q M (x) are divided 
by the same constant. Hence the rational function Rn,m{x) has N -f M + 1 unknown 
coefficients. Assume that /(*) is analytic and has the Maclaurin expansion 

/(*) = ao + «i* + a 2 x 2 H-1- a k x k -f- - • • , 


(4) 
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and form the difference f{x)Q M {x) - Pn(x) — Z(x): 

( ao \ / U \ N 

Z a i xi ) (Z xi ) - Z pj* j =* Z c ; 

j~0 f \j =0 / /=0 ^Y+M+l 

Hie lower index 7 = Af + AT + 1 in the summation on the right side of (5) is chose 
because the first N + M derivatives of /(x) and R>^m (x) are to agree at x = 0. 

When the left side of (5) is multiplied out and the coefficients of the powers of 1 
are set equal to zero for k = 0, 1,..., N + M, the result is a system of N + M -\ 
linear equations: 

m-P0 = 0 

qiao + ai - p\ — 0 

(6) ?2flo + q\a\ + at ~ P 2 — 0 

<7300 + ?2<*l + qi a 2 + a 3 - pi = 0 
RMQN-M + QM-lQN-M+l 4-h Pn ~ PN = 0 


q M a N -M+l +qM-l<lN-M+2 H--|-«N+1 — w 

qMdN-M + 2 + qM-\ON-M+3 H-+ <7]«A r +l + a N+ 2 =0 


qhfan +q M -]a N+ i 4-h + a N+M — 0. 

Notice that in each equation the sum of the subscripts on the factors of each product 
is the same, and this sum increases consecutively from 0 to JV + M. The M equations 
in (7) involve only the unknowns q\, qz, ... t q\i and must be solved first. Then the 
eauations in (6) are used successively to find pn, pi ,. .., pn- 


Example 4.17. Establish the Pade approximation 

15,120 - 69Q0x 2 -1-31 3jc 4 


cos(x) ^ 7?4 t 4(x) = 


15,120 + 660x 2 + 13x 4 


See Figure 4.18 for the graphs of cos(x) and Aufx) over [-5, 5]. 

If the Maclaurin expansion for cos(x) is used, we will obtain nine equations in nine 
unknowns. Instead, notice that both cos(x) and /?4.4(x) are even functions and involve 
powers of x 1 . We can simplify the computations if we start with / (x) — eo$(x 1 /2 ): 

(9) = + + 

In this case, equation (5) becomes 


< 1 1 2 1 3 1 4 

1 2* + 24* 720* + 40,320* 


1 + q\X + qix }-pQ-p\x- pix 


= 0 + Ox + Ox 2 + Ox 3 + Ox 4 + C5X 5 4- cex 6 H-. 


Skc 4.6 Padh Approximations 



Figure 4.18 The graph of y = cos(x) and its Fade 
approximation /?4,4(x). 
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Figure 4.19 (a) Graph of the error E R (x) = cos(x) - J? 4|4 (*) for the Pade approxima¬ 
tion /? 4 i4 (x), (b) Graph of the error Ep (x) = cos(x)—/^(x) for the Taylor approximation 

Ato. 


To evaluate (14), first compute and store x 2 , then proceed from the bottom right term 
in the denominator and tally the operations; addition, division, addition, addition, divi¬ 
sion, and subtraction. Hence it takes a total of seven arithmetic operations to evaluate 
RiA(x ) in continued fraction form in (14). 

We can compare /?4,40) with the Taylor polynomial P&(x) of degree N = 6, 
which requires seven arithmetic operations to evaluate when it is written in the nested 
form 


(15) 


/> 6 (jc) = 1+* 2 



24 



- 1 +x 2 (-0.5+x 2 (0.0416666667-0.0013888889x 2 )) 


The graphs of E R (x) = cos(x) - /f 4-4 (jc) and E P ( jc) = cos(x) - /^(x) over [-1, 1] 
are shown in Figure 4.19(a) and (b), respectively. The largest errors occur at the 
end points and are E R (1) = -0.0000003599 and E F (l) ~ 0.0000245281, respec¬ 
tively. The magnitude of the largest error for /? 4 , 4 <x) is about 1.467% of the error 
for Pciix). The Pade approximation outperforms the Taylor approximation better on 
smaller intervals, and over [-0.1,0.1] we find that £*(0.1) = -0.0000000004 and 
£p(G.l) = 0.0000000%6, ■so the magnitude of the error for R 4 , 4 (x) is about 0.3$4% 
of the magnitude of the error for P&(x). 
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Exercises for Fade Approximations 


1. Establish the Pade approximation: 


e x *R ul (x) = 


2 4 x 
2 — x' 


2- (a) Find the Pad£ approximation /?ij(x) for /(x) = in(l + x)/x. Hint. Start u nh 
the Maclaurin expansion: 

= 1 - f+j • 

(b) Use the result in part (a) to establish the approximation 

6x + x 2 


ta(l +jc) & Ri,i(x) = 


644x ' 


3. (a) Find “i,i(x) for f(x) — tan(x l/ 2 )/x 1;2 . Hint. Start with the Maclauri 
sion: 

,, . , x 2x 2 

/W=l + - + — + 

(b) Use the result in part (a) to establish the approximation 

15x - x 3 


tan(x) «s R 3 . 2 (x) = 


15 - 6 x 2 ' 


4* (a) Find rtu(x) for fix) - arctan(x 1 / 2 )/x 1/2 . Hint. Start with the Maclaur 
expansion: 

x x 2 

/(x) = I" 3+y---*- 

(b) Use the result in part (a) to establish the approximation 

15x + 4x 3 


arctan(x) ^ R'i.iix) = 


15 4 9x^ ’ 


(e) Express the rational function R 3i2 (*) in part (b) in continued fraction form. 
5. (a) Establish the Pade approximation: 


« RlM*) = 


12 + 6x + x 2 


12 — fix 4 x 2 

(b) Express the rational function R 2 ,i{x) in part (a) in continued fraction form. 


6 . (a) Find the Pad£ approximation R 2 ,i{x) for f(x) = ln(l 4 x)/x. Him. Start with 
the Maclaurin expansion: 


/<*> = 1-- + T - T + T -, 


(b) Use the result in part (a) to establish 


ln(l +x) R 3i2 (x) = 


30x4 2lx 2 4x 3 
30 4- 36x + 9x 2 ‘ 


(c) Express the rational function R$ r 2 (x) in part (b) in continued fraction form. 

(a) Find ff 2 , 2 U) for /(x) = tan(x l ^ 2 )jx 1 / 2 . Hint Start with the Maclaurin expan¬ 


sion: 


/Y 1 , ^ . 17x3 62x4 

fM - I + 3 + l5 + 3U + 2l35 + ' 


(b) Use the result in part (a) to establish 


tan{x) ft? i? 5 , 4 (x) = 


945x- 105x 3 4x 5 
945 — 420x 2 + 15x 4 


(c) Express the rational function ^ 5,4 (x) in part (b) in continued fraction form 

± (a) Find R 2 ,z{x) for f(x) - arctan(x 1 / 2 )/x 1/2 . Hint. Start with the Maclaurin 
expansion: 

„ , , * , JC 2 X 3 X 4 

/W = I-3 + T - T + T -.... 


(b) Use the result in part (a) to establish 


arctanfx) ~ Rsa(x) - 


945x + 735x 3 + 64x 3 
945 4 1050x 2 + 225x 4 ' 


(c) Express the rational function/?s ( 4 (x) in part (b) in continued fraction form. 
1 Establish the Padd approximation: 


e* - RiM *= 


120+60x+ 12x 2 + x 3 


120-60x4 12x 2 4X r 

0. Establish the Pads approximation: 

1680 4 840x 4 180x 2 4 20x 3 4x 4 


e* Raa{x) = 


1680 - 840x 4 180x 2 - 20x3 + *4 ■ 
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1. Compare the following approximations to /(x) = e x . 


Taylor: 

Pade: 


7fi(x) = 1 +X + y 



R: 


n + 6x+x 2 
12 — 6x + x 2 


(a) Plot fix), T^{x), and *2,2(x) on the same coordinate system. 

(b) Determine the maximum error that occurs when fix) is approximate : 
T^{x) and V?2.2<>), respectively, over the interval [-1,1]. 

2. Compare the following approximations to /( x) = In(l + x). 


Taylor: 
Pad 6: 


T$(x) =* 
*3,200 = 



30x + 2ix 2 +x 3 
30 + 36 v + 9x2 


(a) Plot fix), T5(x), and *3,2(x) on the same coordinate system. 

(b) Determine the maximum error that occurs when /(x) is approximate 
75<x) and /?3,2(x), respectively, over the interval [-1, 1], 

3. Compare the following approximations to fix) = tan(x). 


Taylor: 

Fade: 


_ , , x 3 , 2x 5 17x 7 

Tg (x) — x +--+ —— — 

3 15 315 


*5.4<x) 


945x - 105x 3 +x 5 
945 - 420x 2 + 15x 4 


| 62x 9 




(a) Plot f(x), Tgix), and *5,400 on the same coordinate system. 

(b) Determine the maximum error that occurs when fix ) is approximated with 
79(x) and *5,400, respectively, over the interval [ -1, 1], 

4. Compare the following Pade approximations to /(x) = sin(x) over the interval 

[- 1 . 2 , 1 . 2 ]. 


*s,4 (x) 

*7,600 


166,320x — 22,260x 3 + 551x 5 
15(11,088 +364x 2 + 5x 4 ) 

11,51 l,339,840x - l,640,635,920x 2 + 52,785,432* 5 - 479,249x 7 
7(1,644,477,120 + 39,702,960x 2 + 453,960x 4 + 2,623x*) 


(a) Plot /(x), *5,400. and *7,&(x) on the same coordinate system. 

(b) Determine the maximum error that occurs when fix) is approximated with 
*5,400 and *7.6 (x), respectively, over the interval [-1,2,1.2]. 
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5. (a) Use equations (6) and (7) to derive /?s,600 and *gg(x) for /(x) = cos(x) over 
the interval [—1.2, 1.2], 

(b) Plot f(x), *6,6 CO, and *g g(x) on the same coordinate system. 

(c) Determine the maximum error that occurs when /(x) is approximated with 
*6,600 and *8,s(x), respectively, over the interval [-1.2, 1.2]. 
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Curve Fitting 


fitting of experimental data. For example, in 1601 the German astronomer Johannes 
Kepler formulated the third law of planetary motion, T = Cx 3 ^ 2 , where x is the dis¬ 
tance to the sun measured in millions of kilometers, T is the orbital period measured 
in days, and C is a constant. The observed data pairs (*, T) for the first four planets, 
Mercury, Venus, Earth, and Mars, are (58, 88), (108, 225), (150, 365), and (228, 687), 
and the coefficient C obtained from the method of least squares is C = 0.199769. The 
curve T = 0. 199769a: 3/2 and the data points are shown in Figure 5.1. 
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Least-squares Line 

In science and engineering it is often the case that an experiment produces a set of 

data points Cv 1; yq). (x N , y N ), where the abscissas {jc* } are distinct. One goal of 

numerical methods is to determine a formula y = fix) that relates these variables. 
Usually, a class of allowable formulas is chosen and then coefficients must be deter- 
mined. There are many different possibilities for the type of function that can be used. 
Often there is an underlying mathematical model, based on the physical situation, that 
will determine the form of the function. In this section we emphasize the class of linear 
functions of the form 

CO y = fix) = Ax + B. 

In Chapter 4 we saw how to construct a polynomial that passes through a set of 
points. If all the numerical values {x*}, {y*} are known to several significant digits 
of accuracy, then polynomial interpolation can be used successfully; otherwise it can¬ 
not. Some experiments are devised using specialized equipment so that the data points 
will have at least five digits of accuracy. However, many experiments are done with 
equipment that is reliable only to three or fewer digits of accuracy. Often there is an 
experimental error in the measurements, and although three digits are recorded for the 
values Ua) and {»}, it is realized that the true value /(**) satisfies 

® /(**) —y*+**. 

where is the measurement error. 

How do we find the best linear approximation of the form (1) that goes near (not 
always through) the points? To answer this question, we need to discuss the errors 
(also caiied deviations or residuals). 

ek = fix k) - yk for I < k < N. 

There are several norms that can be used with the residuals in (3) to measure how 
far the curve y = fix) lies from the data. 

M) Maximum error: 

($) Average error: 

(6) Root-mean-square 

error: 

t next example shows how to apply these norms when a function and a set of 
points are given. 


Eooif) = - yk\}. 


*i(/)=rEi/w-» i, 

k=i 

E 2 (/)=(4fj l/ta)-»l 2 ) 
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Table 5.1 Calculations fur Finding (/} and E 2 (f) for 
Example 5.1 



Example 5.1. Compare the maximum error, average error, and rms error for the lint.; 
approximation y - f(x) = 8.6 '■ to thff data noinrs (-1. tOT C0_ 91. 11 . 71. 12. ? 


(3, 4), (4, 3), (5,0), and (6,-1). 

The errors are found using the values for /(**) and ejt given in Table 5,1. 


(7) E x (f) = max{0,2,0.4, 0.0, 0.4, 0.2,0.8,0.6, 0.0} = 0.8, 

(8) Edf)= “(2-6) =0.325, 

/1.4\ 1/2 

(9) £2(/M-r) ^ 0.41833 


We can see that the maximum error is largest, and if one point is badly in error, its 
vaiue determines Eoo(f)- The average error £[(/) simply averages the absolute value oi 
the error at the various points. It is often used because it is easy to compute. The erro 
E 2 (f) is often used when the statistical nature of the errors is considered. 

A best-fitting line is found by minimizing one of the quantities in equations (4) through 
(6). Hence there are three best-fitting lines that we could find. The third norm £z(/) is the 
traditional choice because it is much easier to minimize computationally. ■ 


Finding the Least-squares Line 

Let { (jc*, be a set of N points, where the abscissas {.**} are distinct. The least- 

squares line y = f(x) — Ax + B is the line that minimizes the root-mean-square error 

The quantity Etif) will be a minimum if and only if the quantity N(E2(f)) = 
+ B - yk) 2 is a minimum. The latter is visualized geometrically by mini¬ 
mizing the sum of the squares of the vertical distances from the points to the line. The 
next result explains this process. 
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Tbble 5.2 Obtaining the Coefficients for 
Normal Equations 



Now hold 4 fixed and differentiate E{A t B) with respect to B and get 
(B) 9£( *’ B) = f^2(Ax t + B-y k ) = 2£(A** + B- y k ). 

38 fct fel 

Setting the partial derivatives equal to zero in (12) and (13), use the distributive 
properties of summation to obtain 

,V n is N 

(14) 0 - ^2,{Axl A- Bx k - x k y k ) = + B ^2 x k ~ ^x k y ki 

i=l A = 1 it=l k=l 


0 = ^(4** + B-y k ) = + NB - 


Equations (14) and (15) can be rearranged in the standard form for a system and 
result in the normal equations (10). The solution to this system can be obtained by one 
of the techniques for solving a linear system from Chapter 3. However, the method 
employed in Program 5,1 translates the data points so that a well-conditioned matrix is 
employed (see exercises). 

Example 5.2. Find the least-squares line for the data points given in Example 5.1. 

The sums required for the normal equations (10) are easily obtained using the values 
in Table 5.2. The linear system involving A and B is 


924 + 20 B = 25 
204+ 8 B = 37. 
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x Figure 5.3 The least-squares line 
y = -1.6071429* + 8.6428571. 


The solution of the linear system is 4 ^ -1.6071429 and B ^ 8.6428571. Therefore, the 
least-squares line is (see Figure 5.3) 

y = -1.6071429* + 8.6428571 ■ 


The Power Fit y as Ax M 

Some situations involve /(*) = Ax M , where M is a known constant. The example of 
planetary motion given in Figure 5.1 is an example. In these cases there is only one 
parameter A to be determined. 

Theorem 5.2 (Power Fit). Suppose that {(**, Jjt)}f =] are N points, where the ab¬ 
scissas are distinct. The coefficient A of the least-squares power curve y = Ax M is 
given by 

(l6> ^=(e^)/(e^ w ) 

Using the least-squares technique, we seek a minimum of the function E(A): 

N 

(17) E(A) = - y k ) 2 . 

k=1 

in this case it will suffice to solve E'(A) = 0. The derivative is 

E\A) = 2 f^Ax? - y k )(x“) =2^^" - **%). 

*=I * = 1 



m 
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Hence the coefficient A is the solution of the equation 
(19) 0 = A 

*=1 Jfe = l 

which reduces to the formula in equation (16). 

Example 5.3. Students collected the experimental data in Table 5.3. The relation is 
d = jg/ 2 , where d is distance in meters and t is time in seconds. Find the gravitational 
constant g. 

The values in Table 5.3 are used to find the summations required in formula (16), where 
the power used is M = 2. 

The coefficient is A = 7.68680/1.5664 = 4.9073, and we get d = 4.9073f 3 and 
g = 2A = 9.7146 m/sec 2 . ■ 

The following program for constructing a least-squares line is computationally sta¬ 
ble: it gives reliable results in cases when the normal equations (10) are ill conditioned. 
The reader is asked to develop the algorithm for this program in Exercises 4 through 7. 

Program 5.1 (Least-squares Line). To construct the least-squares line y = Ax 4- 
B that fits the N data points (jq, yi).(jqy, yx). 

function [A,B]=lsline(X,Y) 

*/,Input - X is the lxn abscissa vector 

*/, - Y is the lxn ordinate vector 

*/,Output - A is the coefficient of x in Ax + B 

’/, - B is the constant coefficient in Ax + B 

xmean=mean(X); 

ymean=mean(Y); 

sujdx2= (X-xmsan) * (X-xmean) '; 

sumxy=(Y-ymean)*(X-xmean)’; 
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A s sumxy/sumx2; 
B=ymean-A*xmean; 


Exercises for Least-squares Line 


In Exercises 1 and 2, find the least-squares line y = f(x) = Ax + B for the data and 



3. Find the power fit y = Ax , where Af = 1, which is a line through the origin, for the 
data and calculate 
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Hint. Use Xk =x * - x, Y& = yk - y and first find the line Y = AX. 

8 . Find the power fits y = Ax 1 and y = Bx 2, for the following data and use Ejif) to 
determine which curve fits best. 



9. Find the power fits y = Afx and y = Bjx 2 for the following data and use Ejif) to 
determine which curve fits best. 



10. (a) Derive the normal equation for finding the least-squares linear fit through the 

origin y = Ax. 

(b) Derive the normal equation for finding the least-squares power fit y = Ax 1 . 

(c) Derive the normal equations for finding the least-squares parabola y = Ax 2 -f B. 

11. Consider the construction of a least-squares line for each of the sets of data points 

determined by Sn = {(jj^ (^) 2 )l£Lp where N = 2, 3, 4. Note that, for each 

value of N the points in 3V all lie on the graph of fix) ~ x 2 over the dosed interval 
[0, 1]. Let Xflf and y N be the means for the given data points (see Exercise 4). Let x 
be the mean of the values of x in the interval [ 0 , 1 ], and let ~y be the mean (average) 
value of f(x) = x 2 over the interval [ 0 , 1 ]. 

(a) Show limjv->oc xn = x. 

(b) Show lim/^oo y N = y . 

12. Consider the construction of a least-squares line for each of the sets of data points: 

Sn = (((b-a)^+a f /((h-tf)-^+a ))}* =1 

for /V = 2, 3, 4,.... Assume that y = f(x) is an integrable function over the closed 
interval [a,b]. Repeat parts (a) and (b) from Exercise 11. 
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1. Hooke’s law states that F = kx, where F is the force (in ounces) used to stretch 
a spring and x is the increase in its length (in inches). Use Program 5.1 to find an 
approximation to the spring constant k for the following data. 

(a) - - (b) -- 

xt Fk Xk Fk 

0.2 3.6 0.2 5.3 


0.6 10.9 0.6 15.9 

0.8 14.5 0.8 21.2 

1.0 18.2 1.0 26.4 

2. Write a program to find the gravitational constant g for fire following sets of data. l.V. 
the power fit that was shown in Example 5.3. 

(a) --;- (b) --; - 

Tune, tk Distance, <& Time, tk Distance, dk 


0.1960 

0.7835 

1.7630 

3.1345 

4.8975 


0.1965 

0.7855 

1.7675 

3.1420 

4.9095 


3. The following data give the distances of the nine planets from the sun and their side 
real period in days. 


Planet 

uistance from 
sun (km x 10 6 ) 

Sidereal period 
(days) 

Mercury 

57.59 

87.99 

Venus 

108.11 

224.70 

Earth 

149,57 

365.26 

Mars 

227.84 

686.98 

Jupiter 

778.14 

4,332,4 

Saturn 

1427.0 

10,759 

Uranus 

2870.3 

30,684 

Neptune 

4499.9 

60,188 

Pluto 

5909.0 

90,710 


Modify your program from Problem 2 to also calculate E 2 (f). Use it to find the 
power fit of the form y = Cx 3 ^ 2 for (a) the first four planets and (b) all nine planets. 

(a) Find the least-squares line for the data points (Or*, yUlf-ip where xk — (0.1 )k 
and y* = x* + cos(k 1/f2 ). 

(b) Calculate £2 (/)■ 

(c) Plot the set of data points and the least-squares line on the same coordinate 
system. 
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Data Linearization Method for y » Ce Ax 

Suppose that we are given the points (*i, yi), {*2, T2), ■ • ■, (*ag Yn) and want to fit an 
exponential curve of the form 

ill y^Ce**. 

The first step is to take the logarithm of both sides; 

ln(y) = Ax -f ln(C). 

Then introduce the change of variables: 

(3) Y = ln(y), X — x, and B — ln(C). 

This results in a linear relation between the new variables X and Y : 

(4) y = AX + B . 

The original points (jt k> yt) in the xy-plane are transformed into the points (A>, Y k ) =. 
(jt, In(yt)) in the XT-plane. This process is called data linearization. Then the least 
squares line (4) is fit to the points {(Xjt, Tit)}- The normal equations for finding A and 
B are 

' N \ n 

T=l / Jfc=ri 

N 

NB = 

Jt=l 

After A and B have been found, the parameter C in equation (1) is computed: 

(6) C = e B . 

Example 5A Use the data linearization method and find the exponential fit y ^ Ce Af 
for the five data points (0,1.5), (1,2.5), (2, 3.5), (3, 5.0), and (4,7.5). 

Apply the transformation (3) to the original points and obtain 

{(Xjt, ft)! - {(0, ln(1.5), (Un(2.5», (2, In(3.5)>, (3, ln(5.0)), (4, ln(7,5)> 

(7) = {(0,0.40547), (1,0.91629), (2,1.25276), (3,1.60944), (4,2,01490)}. 

These transformed points are shown in Figure 5.4 and exhibit a linearized form. The equa¬ 
tion of the least-squares line Y - AX + B for th& points (7) in Figure 5.4 is 



( 8 ) 


Y = 0.391202X + 0.457367. 



Ifcble 5,4 Obtaining Coefficients of the Normal Equations for the Transformed Data Points 
C(A'A,y t )i 



Calculation of the coefficients for the normal equations in (5) is shown in Table 5.4. 

The resulting linear system (5) for determining A and B is 

30 A + 10 B = 16.309742 
10A + 5B= 6.198860. 

The solution is A = 0.3912023 and B = 0.457367. Then C is obtained with the calculation 
q _ ^0.457367 _ 1.579910, and these values for A and C are substituted into equation (1) 
to obtain the exponential fit {see Figure 5.5): 


( 10 ) 


y = \.5799lOe 039i2023x 


(fit by data linearization). 
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Nonlinear Least-squares Method for y — Ce A x 

Suppose mat we are given the points (x\, yi) t U:- >’2 ), ■ ■ ■, (*jv. ys) and want to fit an 
exponential curve: 

/"i _ j4jc 

(11) y = Ce . 

The nonlinear least-squares procedure requires that we find a minimum of 

N 

( 12 ) E(A,C)-jy CeAXk ~ yk)2 ' 

*=l 

The partial derivatives of E(A t C ) with respect to A and C are 

(13) = 2 £(.Ce**-&<£****) 

k=\ 

and 

(14) = 2^(C+‘ -«)(+*)• 

k= I 

When the partial derivatives in (13) and (14) are set equal to zero and then simplified, 
the resulting normal equations are 

c E^‘-E wM=0 ' 

fci *=* 

N N 

i=\ *=i 


(15) 
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The equations in (15) are nonlinear in the unknowns A and C and can be solved using 
Newton's method. This is a time-consuming computation and the iteration involved 
requires good starting values for A and C. Many software packages have a built-in 
minimization subroutine for functions of several variables that can be used to minimize 
E{A, C) directly, For example, the Nedler-Mead simplex algorithm can be used to 
minimize ( 12 ) directly and bypass the need for equations (13) through (15). 


Example 5.5, Use the least-squares method and determine the exponential fit y = Ce A 
for the five data points (0,1.5), (1,2.5), (2,3.5), (3, 5,0), and (4,7.5). 

For this solution we must minimize the quantity £(A, C), which is 


E(A, C) = (C - 1-5 ) 2 + {Ce A - 2.5 ) 2 + (Ce ZA - 3.5 ) 2 
+ ( Ce 3A - 5.0 ) 2 + (Ce 4A - 7.5) 2 . 


We use tire fmins command in MATLAB to approximate the values of A and C that nun 
mize E{A, C ), First we define £(A, C) as an M-file in MATLAB. 

function z“E(u) 

A*u(l) \ 

C-u(2 )i 

z=(C-l.5).~2+(C.*exp(A)-2.6)."2+(C.*exp(2*A)-3.5).*2+.,. 
(C,*axp(3*A)-5.0).“2+(C.*exp(4*A)-7.5),"2; 

Using the fmins command in the MATLAB Command Window and the initial values 
A 1.0 and C — 1.0, we find 
»ffflins( , E’, [1 1]) 
ans = 

0.38357046980073 1.61089952247928 
Thus the exponential fit to the five data points is 

(17) y = 1.6108995e 0 ' 3835705 (fit by nonlinear least squares). 


A comparison of the solutions using data linearization and nonlinear least squares is 
given in Table 5.5. There is a slight difference in the coefficients. For the purpose of 
interpolation it can be seen that the approximations differ by no more than 2 % over The 
interval [0, 4] (see Table 5.5 and Figure 5.6). If there is a normal distribution of the errors 
in the data, (17) is usually the preferred choice. When extrapolation beyond the range of 
the data is made, the two solutions win diverge and the discrepancy increases to about 6 f r 
when* = 10 . 


Transformations for Data Linearization 

The technique of data linearization has been used by scientists to fit curves such 
y = Ce* Ajr> , y = A ln(x) + £, and y = Afx + B. Once the curve has been chosen, 
i suitable transformation of the variables must be found so that a linear relation is 
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obtained. For example, the reader can verify that y = Df(x + C) is transformed 
into a linear problem Y = AX + B by using the change of variables (and constants) 
X = xy\ Y = y,C — —1/A, and D — — J5/A, Graphs of several cases of the 
possibilities for the curves are shown in Figure 5.7, and other useful transformations 
are given in Table 5.6. 


Linear Least Squares 

The linear least-squares problem is stated as follows. Suppose that N data points 
Uu, y*M and a set of Af linear independent functions {/)(*)} are given. We want 
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Figure 5,7 Possibilities for the curves used in “data linearization' 1 . 
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Table 5*6 mange of Variable(s) for Data Linearization 


Function, y = fix) 

Linearized form, Y — Ax + B 

Change of variable(s) and constants 

A 

1 

1 

y~-+B 

y = A-+B 

II 

i 

II 

X 

D 

X 

-1 D 

X 

y ~x+C 

y+ T (xy) + c 

X = xy, Y = y 

-1 -B 

L “ ~A' U ~ ~A 

1 

1 

1 


- = Ax + a 

1 

II 

SK 

A 

II 

* 

J Ax + B 

y 

y 


i i 

\ i 


- = A —h B 

X = Y - - 

* Ax + B 

y x 

x y 

y = A ln(x) + B 

y = Aln(x) + j 8 

X - ln{x), Y = y 

y = Ce Ax 

ln(jy) = Ax + ln(C) 

X = x, Y = In O’) 

C = e s 

£ 

II 

ln(y) = A ln(x) + ln{C) 

X = ln(x). Y = ln(j?) 

j) = (Ax + B)~ 2 

y~V 2 = Ax + B 

X =x,Y = y~V 2 

y = Cxe~ Dx 

In (I'j = -Dx + ln(C) 

X=x,K = ln0 

C^e B ,D = -A 

L 


(L 

y — a 

in (- 1 ) — Ax + in(C) 

X = x, Y = In - - 1 ) 

1 + Ce Ax 


\:v / 

C = e B and L is a constant 
that must be given 


to find M coefficients {cj} so that the function fix) given by the linear combination 

M 

(18) fW^]Tcjfj( x) 

j =i 

will minimize the sum of the squares of the errors 

M N //M \ 

(19) E(C,,C 2 . CM) = £(/(**) - yt ) 2 = £ ({) “ * 

k= 1 * = 1 \ V= 1 / 
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For E to be minimized it is necessary that each partial derivative be zero (i.e., 
dE/Bcj = 0 for i — 1,2,,.., M), and this results in the system of equations 

(20) £ - y)j (fi(xit)) =0 for . = 1,2- M. 

Interchanging the order of the summations in (20) will produce an M x M system 
of linear equations where the unknowns are the coefficients \Cj). They are called the 
normal equations : 

M / N \ N 

(21) ^ ( ^AMfjM ) cj ~^fi(xk)y k for i - 1,2.Af. 

j -1 V=i / A=t 


The Matrix Formulation 

Although (21) is easily recognized as a system of M linear equations in M unlaiowns, 
one must be clever so that wasted computations are not performed when writing the 
system in matrix notation. The key is to write down the matrices F and F' as follows: 

/2U1) Zm(x 1)' 

A 0*2) fiM ■■■ fstM 

F ~ AM AM ImM 


AM) A(x N ) fhf(XN) 


AM AM AM ■ 

. f 2 (xi) AM fiM ■ 

F — 

_fiu(*i) fhlM) f\tM) • 
Consider the product of F' and the column matrix F; 


AM)~ 
h (xn) 

/m(xn) 


FY = 


[AM Mx 2 ) /i(* 3 ) ■■■ /iU*) 1 [y,"| 

fy( JCii ft (xt i ft (xt) ft(XfA ! ! v* ! 


j i. V'-i i 


J i v* jV / J jJ. 


fuM !mM fuM 


I I yiv 


Die element in the ith row of the product F'Y in (22) is the same as the ith element 
he column matrix in equation ( 21 ); that is. 


2 A(xk)yk = row, F' ■ [y\ y 2 ... y #]'. 

Jt=! 
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inow consiaer tne proauct r r , wmcn is an m x m matnx: 


F'F = 


'AM fiixz) f](x 3 ) 

AM AM AM 


\/mM ImM /mM 


Im(xn) 


~AM hM 
f\M fz(x 2 ) 
fiM hM 


Mxn) Mx n ) 


h*M~ 

/«(X2) ' 
fu(x 3 ) 

(m (xjv) 


The element in the ith row and j'th column of F'F is the coefficient of cj in the 
ith row in equation ( 21 ); that is, 

N 

( 2 -*) Y. = fiMfjUl) + fi(X2)fj(.Xl> + ■ • • + 

*=] 

When M is small, a computationally efficient way to calculate the linear least-squares 
coefficients for (18) is to store the matrix F, compute F'F, and F'Y and then solve 


F'FC^F'Y for the coefficient matrix C, 


Polynomial Fitting 

When the foregoing method is adapted to using the functions {/;(*) = W -1 } and the 
index of summation ranges from j = 1 to j = M + 1, the function f(x) will be a 
polynomial of degree M: 

(26) f{x) = c\ + c 2 x + C 3 X 2 H-h \x M . 

We now show how to find the least-squares parabola, and the extension to a poly¬ 
nomial of higher degree is easily made and is left for the reader. 

Theorem 5-3 (Least-squares Parabola)- Suppose that {(**, yjt)}^j are /v points, 
where the abscissas are distinct. The coefficients of the least-squares parabola 

(27) y = /(*) = Ax 2 + F* + C 
are the solution values A, B, and C of the linear system 

(l>* 4 )*+(l>*) B + (£>*) c = f>7 

^)A+(g^B + WC-g». 


28 ) 
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Proof, The coefficients A, B, and C will minimize the quantity: 

N 

( 29 ) E(A, 5 , C) — + Bx k + C ~ yt ) 2 

The partial derivatives BE/BA, BE/BB , and BE/BC must all be zero. This results in 

BE(A, B, C) A , . _ 

0 =---= 2 J2^ Ax k +Bx k + C- yjt)-(*£), 

i 

(30) 0 = ^ = 2 J^(A*J + £** + C - y*) 1 {**), 

dB fci 

0 = 8g(C) =2g(^ + 0« + C - *)‘(1). 

Using the distributive property of addition, we can move the values A, B, and C 
outside the summations in (30) to obtain the normal equations that are given in (28). * 

Example 5.6. Find the least-squares parabola for the four points (-3,3), (0,1), (2,1), 
and (4, 3). 

The entries in Table 5.7 are used to compute the summations required in the linear 
system (28). 

The linear system (28) for finding A, B, and C becomes 

353A + 455 + 29C = 79 
45A + 295+ 3C= 5 
29A+ 35 + 4C = 8. 

The solution to the linear system is A = 585/3278,5 = -631/3278, and C — 1394/1639, 
and the desired parabola is (see Figure 5.8) 

585 2 631 1394 - 

* = 32^8* - 3278* + 1639 = °' ,78462 * “ 011!?2495 * + °' 850519 ' ' 
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Polynomial Wiggle 

It is tempting to use a least-squares polynomial to fit data that is nonlinear. But if the 
data do not exhibit a polynomial nature, the resulting curve may exhibit large oscilla¬ 
tions. This phenomenon, called polynomial wiggle, becomes more pronounced with 
higher-degree polynomials. For this reason we seldom use a polynomial of degree 6 or 
above unless it is known that the true function we are working with is a polynomial. 

For example, let f(x) = 1.44/x 2 + 0.24x be used to generate the six data points 
(0.25,23.1), (1.0,1.68), (1.5,1.0), (2.0,0.84), (2.4,0.826), and (5.0, 1.2576). The 
result of curve fitting with the least-squares polynomials 

Plix) *= 22.93 - 16.96* + 2.553x 2 , 

/Mx) <= 33.04 - 46.51x + 19.51* 2 - 2.296x 3 , 

P 4 (x) = 39.92 - 80.93* + 5S.39* 2 - 17.15* 3 + 1.680* 4 , 

and 

P 5 (x) = 46.02 - 118.1* + 119.4* 2 - 57.51* 3 + 13.03* 4 - 1.085* 5 

is shown in Figure 5.9(a) through (d). Notice that Psix), Pi(x), and Psix) exhibit a 
large wiggle in the interval [2, 5]. Even though Ps(x) goes through the six points, it 
produces the worst fit. If we must fit a polynomial to these data, Piix) should be the 
choice. 

The following program uses the matrix F with entries /; (*) = 1 from equa¬ 

tion (18). 
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(c) id) 

Figure 5.9 (a) Using Pi{x) to fit data, (b) Using Psix) to fit data, (c) Using Pa U) to 
fit data, (d) Using P$(x) to fit data. 


Program 5.2 (Least-squares Polynomial). To construct the least-squares polyno 
mial of degree M of the form 

P M ix) =Cl +C2X +ClX 2 + ■-PC M X M ~ i +C\i+\X M 

that fits the N data points {(**, i ■ 

function C ■ lspoly(X,Y,M) 

‘/.Input - X is the lxn abscissa vector 

’/. - Y is the lxn ordinate vector 

% - H is the degree of the least-squares polynomial 

Output - C is the coefficient list for the polynomial 

n=length(X); 

B=zeros(1:M+1); 

F*zeros(n,M+l); 

‘/.Fill the columns of F with the powers of X 
for k=l;M+1 

F(:,k)=X J .“(k-1); 

end 

*/,Solve the linear system from (25) 
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A=F * *F; 

B=F , *Y ) ; 

C-A\B; 

C®flipud(C); 


Exercises for Curve Flttinc 



For the Pivsn <=**♦ of dntn finrf 


(a) fix) = Ce Ax t by using the change of variables X = x, Y = ln(y), and C — e B , 
from Table 5.6, to linearize the data points. 


(b) 


f{x) = Cx A , by using the change of variables X — ln(x), Y = ln(y), and 
C = e B , from Table 5.6, to linearize the data points. 


(c) Use Eoif) to determine which curve gives the best fit. 
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4, For the given set of data, find the least-squares curve: 

(a) f(x) = Ce Ax , by using the change of variables X = x,Y = ln(y), and C — e B . 
from Table 5.6 , to linearize the data points. 

(b) fix ) = 1 /{Ax + B), by using the change of variables X = x and Y = l/). 
from Table 5.6, to linearize the data points. 

(c) Use E 2 (f) to determine which curve gives the best fit. 


Xk 

yk 

-1 

6.62 

0 

3.94 

1 

2.17 

2 

1.35 

3 

0.89 


5. For each set of data, find the least-squares curve: 

(a) f{x) = Ce Ax , by using the change of variables X = x t Y = ln(y), and C = e‘ 
from Table 5,6, to linearize the data points, 

(b) /(x) — (Ar + B) ~ 2 , by using the change of variables X = x and Y = y -J/ - . 
from Table 5.6, to linearize the data points. 

(c) Use E 2 {f) to determine which curve gives the best fit. 


X k : 

yk 

UD 

Xk 

1 yk 

-1 

13.45 

-1 

13.65 

0 

3.01 

0 

1.38 

1 

0.67 

1 

. 0.49 

2 

0.15 

3 

0.15 


6 . Logistic population growth. When the population P(t) is bounded by the limiting 
value L, it follows a logistic curve and has the form P(t) = L/(l + Ce At ). Find \ 
and C for the following data, where L is a known value. 

(a) (0, 200), (1,400), (2, 650), (3, 850), (4, 950), and L = 1000. 

(b) (0, 500), (1, 1000), (2, 1800), (3,2800), (4, 3700), and L = 5000. 

7. Use the data for the U.S. population and find the logistic curve P(t). Estimate the 
population in the year 2000 . 
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(a) Assume that L = 8 x 10 s 


(b) Assume that L = 8 x 10 8 


Year 

tk 

Pk 

Year 

tk 

Pk 

1800 

-10 

5.3 

1900 

0 

76.1 

1850 

-5 

23.2 

1920 

2 

106,5 

1900 

0 

76.1 

1940 

4 

132.6 

1950 

5 

152.3 

1960 

6 

180.7 




1980 

L* 

226.5 


n Exercises 8 through 15, cany out the indicated change of variables in Table 5.6, and 
ierive the linearized form for each of the following functions. 



10. y = 


1 

Ax + B 


12. y = A ln(x) + B 
14. y = {Ax + B )~ 2 


y “ 


D 

x + C 


l (y. (a) Follow the procedure outlined in the proof of Theorem 5,3 and derive the normal 
equations for the least-squares curve fix) = A cos(jc) + B sin(x). 

(b) Use the results from part (a) to find the least-squares curve fix) = A cos(x) -h 
& sin(x) for the following data: 



yk 

-3.0 

- 0.1385 

-1.5 

-2.1587 

0.0 

0.8330 

1.5 

2.2774 

3,0 

-0.5110 


17, The least-squares plane z = Ax + By + C for the N points (xi,yi,zi), 
(jcjv, yx,ZN) is obtained by minimizing 

N 

E(A,B, C) = J2 (Axk + Byt + C- Zk) 2 . 

*=1 
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Derive the normal equations: 




i>») a+( f>?) *+(x>) c=£>». 

jfc*=l / \k=\ / \*=1 / Jfc=l 

A s ATi- _ 


w k nw k ) 


a -r AC = £^Zk 

t=1 


18. Find the least-squares planes for the following data. 

(a) (1,1,7), (1, 2,9), (2,1,10), (2, 2,11), (2, 3, 12) 

(b) (1,2,6), (2,3,7), (1,1,8), (2,2, 8), (2,1,9) 

(c) (3,1, -3), (2,1,-1), (2, 2,0), (1,1,1), (1, 2, 3) 

19. Consider the following table of data 


2.0 5.0 

3.0 10.0 

4.0 17.0 

5.0 26.0 

When the change of variables X = xy and Y = 1/jy are used with the func 
>’ = D/{x + C), the transformed least-squares fit is 

-17.719403 
y “ jc -5.476617' 

When the change of variables X — x and Y = 1/y are used with the function 
1 /{Ax + B), the transformed least-squares fit is 


' -0.1064253* + 0.4987330' 

Determine which fit is best and why one of the solutions is completely absurd. 


Algorithms and Programs _ 

1 . The temperature cycle in a suburb of Los Angeles on November 8 is given in 
accompanying table below. There are 24 data points. 

(a) Follow the procedure outlined in Example 5.5 (use the f mins command) to 
the least-squares curve of the form f(x ) = A cos(Bx) +C sin(Dx) for the g 
set of data. 
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(b) Determine £ 2 (/). 

(c) Plot the data and the least-squares curve from part (a) on the same coordinate 
system. 



Interpolation by Spline Functions 

Polynomial interpolation for a set of N + 1 points {(x kt yt)}^ is frequently unsatis- 
Uctory. As discussed m Section 5.2, a polynomial of degree N can have N - 1 relative 
maxima and minima, and the graph can wiggle in order to pass through the points 
Another method is to piece together the graphs of lower-degree polynomials S k (x) and 
interpolate between the successive nodes (x k , y k ) and (**+,, y k+i ) (see Figure 5 . 10 ). 


{x k , y k ) 



*1 x k x k+l x s-l X N 


Figure 5.10 Piecewise polynomial interpolation. 
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*0 ^ x k x k+i */v-i % 

figure 5.11 Piecewise linear interpolation (a linear spline). 


The two adjacent portions of the curve y = £*(*) and y = &+i(jr), which lie above 
[xk, xjt+iJ and [x* + i, x k+2 ], respectively, pass through the common fenof (.t* +1 , y* + j). 
The two portions of the graph are tied togeuier at the knot (x*+i, y*+i). and the set of 
functions forms a piecewise polynomial curve, which is denoted by S(x). 


Piecewise Linear interpolation 

Hie simplest polynomial to use, a polynomial of degree 1, produces a polygonal path 
that consists of line segments that pass through the points. The Lagrange polynomial 
from Section 4.3 is used to represent this piecewise linear curve: 

(1) St(x) — y k —— 4- y*+i ——for x k < * < **+ 1 . 

— JCfc-hl *fc+l ~ Xk 

The resulting curve looks like a broken line (see Figure 5.11). 

An equivalent expression can be obtained if we use the point-slope formula for a 
line segment: 


&(■*) = yk + dk{x - x k ), 


where d k — (yjt+i — yk)/(*k+i ~ - K k)- The resulting linear spline function can be 
written in the form 


yo + doCr ~ *o) for x in fjr 0 , *j], 

yi +d\(x - X[) for a: in [*i,x 2 ], 

yk + d k (x - x k ) for a: in [a:*, x k +il 

>N-l +d?/_ \{x for* in Uw-i, *nJ. 


( 2 ) 


S(x) = 


Sec. 5.3 Interpolation by Spline Functions 


281 


The form of equation (2) is better than equation (1) for the explicit calculation of 
S LO. It is assumed that the abscissas are ordered *o < < ■ ■ ■ < jtjv -1 < *jv. For 

a rixed value of x, the interval [at*, JCjt+il containing x can be found by successively 
computing the differences x — x\ ,... , x — x k , x ~ x k+ t until k + 1 is the smallest 
integer such that x — at* + i c 0, Hence we have found k so that x k < x < x k +i, and 
the value of the spline function 5(x) is 

Cl ^(AT) = 5t(A:) = yfc + djt(x-Xfe) for x k < x < x k+i . 

These techniques can be extended to higher-order polynomials. For example, if an 
odd number of nodes xo t xi, ■. ., X2M is given, then a piecewise quadratic polyno¬ 
mial can be constructed on each subinterval [* 2 Jb * 2 *h- 2 ], for k = 0 , 1 , ..,, M — 1 . 
.3 diortcoming of the resulting quadratic spline is that the curvature at the even nodes 
\ - changes abruptly, and this can cause an undesired bend or distortion in the graph. 
The second derivative of a quadratic spline is discontinuous at the even nodes. If we 
use piecewise cubic polynomials, then both the first and second derivatives can be 
made continuous. 


Piecewise Cubic Splines 

The fitting of a polynomial curve to a set of data points has applications in CAD 
(computer-assisted design), CAM (computer-assisted manufacturing), and computer 
graphics systems. An operator wants to draw a smooth curve through data points that 
are not subject to error. Traditionally, it was common to use a french curve or an ar¬ 
chitect’s spline and subjectively draw a curve that looks smooth when viewed by the 
eye. Mathematically, it is possible to construct cubic functions S k (x) on each inter¬ 
val (a;,:. , Xjt+i i so that the resulting piecewise curve y = S(x) and its first and second 
derivatives are all continuous on the larger interval [*o. The continuity of S'(x) 
means that the graph y — 5(x) will not have sharp comers. The continuity of S r, U) 
means that the radius of curvature is defined at each point. 

Definition 5.1 (Cubic Spline Interpolant). Suppose that {(a;*, T£)}£Lg tire N -f- 1 
points, where a =xq < atj < = b. The function S(x) is called a cubic 

spline if there exist N cubic polynomials S k (x ) with coefficients Jjt.i. s ki 2, and 
3 that satisfy the properties: 

I. 5(X) = S*(Af) = **,0 + ■**,!(* - *lt) + Sk.l(X - X k ) 2 + s fti3 (x - X k ) 3 




for* e [Jr*, Xit+i] and k = 0 ,1, ., 

N-l 

II. 

S(*k) = yk 

fori = G, 1,. 

...AT. 


Ill, 

$k t**+l) — S*+l (■**+! ) 

fori = 0,1,.. 

..,N- 2. 


IV. 


fori = 0,1, ., 

.. f JV - 2. 


V. 

S'Uxt+i) = s* +l ( jc*+i) 

fori = 0, 1, .. 
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Property 1 states that ■$(*) consists of piecewise cubics. Property II states that th 
piecewise cubics interpolate the given set of data points. Properties III and IV requir 
that the piecewise cubics represent a smooth continuous function. Property V state- 
that the second derivative of the resulting function is also continuous. 


Existence of Cubic Splines 

Let us try to determine if it is possible to construct a cubic spline that satisfies proper¬ 
ties I through V, Each cubic polynomial $*(*) has four unknown constants ■ 

3 % -, 2 , and s*. 3 ); hence there are AN coefficients to be determined. Loosely speaking, 
we have 4 N degrees of freedom or conditions that must be specified. The data points 
supply N 4 - 1 conditions, and properties III, IV, and V each supply N — 1 conditions 
Hence, N + 1 4 - 3 (N — 1) = AN — 2 conditions are specified. This leaves us two addi 
tional degrees of freedom. We will call them end-point constraints, they will involve 
either S'(x) or 5 /r (Jc) at jtq and xy and will be discussed later. We now proceed with 
the construction. 

Since S(x) is piecewise cubic, its second derivative S"(jc) is piecewise linear on 
[jto, xy]. The linear Lagrange interpolation formula gives the following representation 
for S"(x) = S£(x): 

(4) S?(x) = S>*) x ~ Xk+ ' + SV t +i) X ~ Xk 

Xk-Xk + \ X k + 1 -Xk 

Use m* = S"(xk), m +1 = $"(**+ 1 ), and h k = **+1 - x k in (4) to get 

,, m k mu 1 

(5) S*(*) = -*) + ~(x-xt) 

h k hk 

for jc* < x < jrt+i and k = 0, 1. N - 1. Integrating (5) twice will introduce tv 

constants of integration, and the result can be manipulated so that it has the form 

( 6 ) S k (x) = ^-(**+1 -x ) 3 + ~Xk ) 3 +Pk(*k+l -x) +q k (x-x k ). 

bh k oh k 

Substituting ** and into equation ( 6 ) and using the values y k = S k (x k ) and 
Vk+i = S*(jc*+i) yields the following equations that involve p k and q k , respectively; 

(7) y k = —h* + p k h k and y k +1 = — 7 —h* + Vkhk- 

These two equations are easily solved for p k and q k , and when these values are sutu 
stituted into equation ( 6 ), the result is the following expression for the cubic function 
&<*): 
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the unknown coefficients { m k ). To find these values, we must use the derivative of (8), 
which is 

400 = + t ^ L{x ~ Xk ^ 

(9) * * 

_ (y*. _ \ , n+i _ m±\hk 

U* 6 ) h k h k 

Evaluating (9) at x k and simplifying the result yield 

(.10) $*(**) ~ -~-h k - ^~h k + d k , where d k = . 

3 o fi k 

Similarly, we can replace k by k — I in (9) to get the expression for and 

evaluate it at x k to obtain 

(11) = -x~h k -1 H- ~-h k ~i ~hd k ~i. 

3 0 

Now use property IV and equations (10) and (11) to obtain an important relation 
involving m k -i, mt, and m k +i : 


112) h k ~im k -1 + 2(A*_i + h k )m k + h k m k +i = u k 

where u k =6 (d k ~ *4-0 for k = 1,2, ..,, N — L 


Construction of Cubic Splines 

Observe that the unknowns in (12) are the desired values {m*}, and the other terms 
am constants obtained by performing simple arithmetic with the data points {(**, y k )). 
therefore, in reality system (12) is an underdetermined system of JV — 1 linear equa- 
ttons involving N + 1 unknowns. Hence two additional equations must be supplied. 
They ore used to eliminate mo from the first equation and my from the (N — l)st 
equation in system (12). The standard strategies for the end-point constraints are sum¬ 
marized in Table 5.8. 

Consider strategy (v) in Table 5.8, If mo is given, then homo can be computed, and 
me first equation (when k — 1) of (12) is 

2(ho + fci)mi + h\m2 - «i ■ - homo. 

Similarly, if my is given, then hy- { my can be computed, and the last equation (when 
* = -1) of (12) is 

hy- 2 m N^ 2 + 2 (hy ^2 + hy-])my~l — uy -1 -hy-\my. 
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Table 5.8 End-point Constraints for a Cubic Spline 



Equations (13) and (14) with (12) used for k = 2, 3, .... N - 2 form N - 1 lines 
equations involving die coefficients , m 2 ,.. -, m jv_i . 

Regardless of the particular strategy chosen in Table 5.8, we can rewrite equa- 
tions 1 and AT — 1 in (12) and obtain a tridiagonal linear system of the form HM = V, 
which involves mi, m 2 . m/v_j: 


(15) 


The linear system in (15) is strictly diagonally dominant and has a unique solu¬ 
tion (see Chapter 3 for details). After the coefficients {m*. } are determined, the spline 
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w — X - x k 


and Sk(x) is used on the interval x* < x < Xk+\. 

Equations (12) together with a strategy from Table 5.8 can be used to construct a 
cubic spline with distinctive properties at the end points. Specifically, the values for mo 
and m N in Table 5.8 are used to customize the first and last equations in (12) and form 
system of N - 1 equations given in (15). Then the tridiagonal system is solved for 

the remaining coefficients mi.mj.m^i- Finally, the formulas in (16) are used to 

determine the spline coefficients. For reference, we now state how the equations must 
&eprepared for each different type of spline. 


End-point Constraints 

The following five lemmas show the form of the tridiagonal linear system that must be 
solved for each of the different endpoint constraints in Table 5.8. 

Lethma 5.1 (Clamped Spline). There exists a unique cubic spline with the first 
derivative boundary conditions S'(a) = do and S'(b ) = d^. 

Proof Solve the linear system 

+ 2h\^ nt] + hirtt 2 — u\ — 3 (do - S'(xo)) 

+ 2(h k ~i + h k )m k + = u k for k = 2, 3. N - 2 

3 

k N - 2 mN -2 + (2h N - 2 + -hN-\)m N ~\ = u N -\ - 3 (S’(xn) - d N ^\). • 


Remark. The clamped spline involves slope at the ends. This spline can be visualized 
as die curve obtained when a flexible elastic rod is forced to pass through the data 
points, and the rod is clamped at each end with a fixed slope. This spline would be 
useful to a draftsman for drawing a smooth curve through several points. 

Lemma 5.2 (Natural Spline). There exists a unique cubic spline with the free 
boundary conditions S ,! (o) = 0 and S ff (b) = 0. 
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Proof. Solve the linear system 

2(A 0 + Ai)mj +h\m 2 = u\ 

hk-imk-i + 2(hfc_i + h k )m k + k k m k+ \ = u k for k = 2, 3, , - -, N - 2. 
hM-2 m N —2 T - 2{ftjV-2 + ^N — = M IV—1- * 


Remark. The natural spline is the curve obtained by forcing a flexible elastic rod 
through the data points, but letting the slope at the ends be free to equilibrate to tht 
position that minimizes the oscillatory behavior of the curve. It is useful for fitting i 
curve to experimental data that are significant to several significant digits. 

Lemma 5.3 (Extrapolated Spline). There exists a unique cubic spline that use; 
extrapolation from the interior nodes at xt and x 2 to determine S” (a) and extrapolatioi 
from the nodes at xn-i and x^- 2 to determine S"(A). 

Proof Solve the linear system 


^3/to + 2hi + ^ mi + ^j = u i 

hk-imk^] + 2(Ajt_i + hk)mi t + Aitm *+1 = u k for k = 2 , 3, ..N - 2 
( h »- 2 _ mN ^ + ( 2hN ~ 2 +ihN ~' + mN ~' = UN ~ X ’ 


Remark. The extrapolated spline is equivalent to assuming that the end cubic is W 
extension of the adjacent cubic; that is, the spline forms a single cubic curve over the 
interval [x 0 , x 2 ] and another single cubic over the interval [xn-2, xnI 

Lemma 5.4 (ParabolicaUy Terminated Spline). There exists a unique cubic spline 
that uses S"(x) = 0 on the interval [xo, xi] and S"(x) = 0 on [x^-i, 

Proof Solve the linear system 

(3Ao + 2Ai)mi +him 2 = «i 

hk-\mk~\ + 2(A*-i + A* )m k + A*m*+i = for k = 2, 3, ..., N - 2 

+ (2fr;v-2 + 3Ajv_i)mjv_i “ «#-!■ 


Remark. The assumption that S"(x) = 0 on the interval [x 0 , xil forces the cubic % 
degenerate to a quadratic over [xo, xi], and a similar situation occurs over [*n-i > tW 
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Lemma 5.5 (End-point Curvature-adjusted Spline). There exists a unique ci 
spline with the second derivative boundary conditions S ff (a) and S"(b) specified. 

Proof Solve the linear system 

2(ho + hi)m\ +h\m 2 — u\ — hoS tr {x$) 
h k ~\m k -\ +2(A*_i + h k )m k + h k m k+ \ =u k for k - 2, 3. N - 2 

hfi-2^N-2 + 2 (A^v_2 + Ajv_i)mAr _ 1 = u^-\ — hfi- i$"(xjv). 


Remark. Imposing values for S" (a) and S"(b) permits the practitioner to adjust the 
curvature at each endpoint. 

The next five examples illustrate the behavior of the various splines. It is possible 
to mix the end conditions to obtain an even wider variety of possibilities, but we leave 
these variations to the reader to investigate. 

Example 5.7. Find the clamped cubic spline that passes through (0, 0), (1,0.5), (2,2.0), 
and (3, 1.5) with the first derivative boundary conditions S'(0) = 0.2 and 5'(3) = -1, 
First, compute the quantities 

ho = h\ — h-2 ~ 1 

do = (yi- yo)/h 0 = (0.5 - 0,0)/l = 0.5 
d\ = Os - yi)/Ai = <2.0- 0.5)/l = 1.5 
di = (y 3 — yi)/ A 2 = (1.5 - 2.0 )/1 = -0.5 
ki = 6 (di - do) = 6(1.5 - 0.5) = 6.0 
u 2 - 6(d 2 - di) = 6(-0.5 - 1.5) - -12.0. 

Then use Lemma 5.1 and obtain the equations 

+m 2 =6.0- 3(0.5 - 0.2) =5.1, 

W[ + (2 + ^ m 2 = -12.0-3(-1.0-(-0.5)) = -10.5. 
when these equations are simplified and put in matrix notation, we have 



straightforward task to compute the solution mt = 2.25 and m 2 = —3.72 Now 
apftythe equations in (i) of Tabic 5.8 la determine the coefficients mo and my 

m 0 = 3(0.5 - 0.2) - ^ = -0.36, 

m-i = 3(-1.0+ 0.5) - = 0.36. 
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Figure 5.12 The clamped cubic Figure 5,13 The natural cubic spline 

spline with derivative boundary condi- with S"( 0) = 0 and S"( 3) = 0. 

dons: S'{0) = 0.2 and S'(3) = -1 


Next, the values mo = —0.36, mj = 2.25, m 2 — -3.72, and m 3 — 0.36 are subsumed 
into the equations (16) to find the spline coefficients. The solution is 

$o(x) = 0,48x 3 - 0.18x 2 +0.2x for 0 < x < 1, 

Si(x) = -1.04(x - l) 3 + 1.26<x - l) 2 
(18) + 1.28(x — 1) + 0.5 for 1 < x < 2, 

S 2 (x) - 0,68(x - 2) 3 - 1.86(x - 2) 2 

+ 0.68 (x - 2) + 2.0 for 2 < x < 3. 

This clamped cubic spline is shown in Figure 5.12. ■ 

Example 5.8. Find the natural cubic spline that passes through (0,0.0), (I, 0.5), (2, 2.0), 
and (3, 1.5) with the free boundary conditions S f, (x) — 0 and S"( 3) = 0. 

Use the same values {ft*}, {d*}, and {«*] that were computed in Example 5.7. Then 
use Lemma 5.2 and obtain the equations 

2(1 + l)mi + m 2 — 6.0, 

mi +2(1 4- l)m 2 = -12.0. 

The matrix form of this linear system is 

[4.0 l.Olfmfl T 6.0"j 

|_ 1.0 4.oJ|m 2 J“|_-12.oJ‘ 

It is easy to find the solution m\ =2.4 and m 2 — —3.6. Since mo* $"{0) = 0 and 
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m 3 = S'" (3) = 0, when equations (16) are used to find the spline coefficients, the result is 

So(*) = 0.4x 3 + O.lx for 0 < x < 1 , 

S,(x) = -(x-l) 3 + 1.2(x-l) 2 

09) +1.3U-D+0.5 for 1 < x < 2, 

$ 2 {x) = 0.6(x - 2) 3 - 1,8(x - 2) 2 

+ 0.7(x — 2) -f- 2.0 for 2 < x < 3. 

This natural cubic spline is shown in Figure 5.13. a 

Example 5.9. Find the extrapolated cubic spline through (0,0.0), (1,0.5) (2 2 0) and 
(3,1.5). 

Use the values {ft*}, {dt}, and {«*} from Example 5.7 with Lemma 5.3 and obtain the 
linear system 

(3 + 2 + l)mi + (1 - l)m 2 = 6 . 0 , 

(1 - l)m t + (2 + 3 + l)m 2 = -12.0. 

The matrix form is 

[ 6.0 O.Olfmil f 6.01 

[ 0.0 6 . 0 j|m 2 J |_- 12 . 0 _|’ 

and it is trivial to obtain m\ = 1.0 and m 2 — —2.0. Now apply the equations in (iii) of 
Table 5,8 to compute mo and my, 

m 0 = 1.0 -(- 2 . 0 - 1 . 0 ) = 4 . 0 , 
m 3 = - 2.0 + (- 2.0 - 1 . 0 ) = - 5 . 0 . 

Finally, the values for {m*} are substituted in equations (16) to find the spline coefficients. 
The solution is 

5b(x) = —0,5x 3 + 2.0x 2 — x for 0 < x < 1, 

S\ (x) = —0,5(x - l ) 3 -f 0.5(x - l ) 2 
(20) + l,5(x - i) + 0.5 for 1 < x < 2 , 

5 2 (x) = -0.5(x-2) 3 -(x-2) 3 

4- (x — 2) 4- 2.0 for 2 < x < 3, 

The extrapolated cubic spline is shown in Figure 5.14, ■ 

Example 5.10. Find the parabolically terminated cubic spline through (0,0,0), (1,0.5), 
(2,2.0), and (3, 1.5). 
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Use {A*}, \dk), and { u a) from Example 5.7 and then apply Lemma 5.4 to obtain 

(3 4- 2)m; + m 2 = 6.0, 
m i + (2 + 3)m2 = —12.0. 

The matrix form is 

[ 5.0 l, 0 l[mil_[ 6 . 0 ] 

[l.O 5.0j |*i 2 J “ L— 12 -°J ’ 

and the solution is m i = 1.75 and mi = -2.75. Since 5”(x) = 0 on the subinterval at 
each end, formulas (iv) in Table 5.8 imply that we have mo = m 1 = 1.75, and m 3 = m ; - 
—2.75. Then the values for {m*} are substituted in equations (16) to get the solution 

50 OO = 0.875x 2 — 0.375x for 0 < x < 1 , 

51 (x) = -0.75(x - l ) 3 + 0,$75(x - l ) 2 

V “' + 1.375(x - 1) +0.5 for 1 < x < 2, 

Si(x) = - 1 .375(x - 2 ) 2 + d.875(x - 2) + 2.0 for 2 < x < 3. 

This parabolically terminated cubic spline is shown in Figure 5.15. ■ 

Example 5.11. Find the curvature-adjusted cubic spline through (0,0.0), (l,0.5i, 
(2, 2.0), and (3,1.5) with the second derivative boundary conditions S"(0) = -0.3 md 
5" (3) = 3.3. 

Use {hk\, {*4}, and {«*} from Example 5.7 and then apply Lemma 5.5 to obtain 

2(1 + l)m 1 + m 2 = 6.0 - (-0.3) = 6.3, 
mi + 2(1 + l)m 2 = -12.0- (3-3) = -15.3. 

The matrix form is 

[4.0 1.0] [«*]]_[ 6.3] 

[l.O 4 .oJ |m 2 J “ L-15.3J ’ 


y 



Figure 5,15 The parabolical iy 
terminated cubic spline. 


y 



Figure 5*16 The curvature ad¬ 
justed cubic spline with 5"(0) = 
—0,3 and 5"(3) = 3.3. 


and the solution is mi = 2.7 and mi — —4.5. The given boundary conditions are used 
to determine mo = 5"(0) = -0.3 and m 3 = 5"(3) - 3.3. Substitution of {m k } in 
equations (16) produces the solution 

So(x) = 0,5x 3 -0.15* 2 +0.15* for 0 < x < 1, 

Si(x) = —1.2(x - l ) 3 + 1.35(x - l ) 2 
t22) + 1.35(x — 1) +0.5 for 1 < x < 2, 

$ 2 (x) = L3(jc - 2 ) 3 - 2.25(x - 2 ) 2 

+ 0,45(x - 2 ) + 2.0 for 2 < x < 3. 

This curvature-adjusted cubic spline is shown in Figure 5.16. * 

Suitability of Cubic Splines 

A practical feature of splines is the minimum of the oscillatory behavior that they 
possess. Consequently, among all functions f <jc) that are twice continuously differen¬ 
tiable on [a, b ] and interpolate a given set of data points f (jc*, y*))£L 0 , the cubic spline 
has less wiggle. The next result explains this phenomenon, 
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Theorem 5,4 (Minimum Property of Cubic Splines), Assume that / e C 2 [a, b] 
and Six) is the unique cubic spline interpolant for f(x) that passes through the points 
(Ut. /(**)) U=o and satisfies the clamped end conditions S'(a) = f'(a) and S'(b) = 
fib). Then 


(S"(x)) 2 dx< / (f"(x)) 2 dx. 


Proof. Use integration by parts and the end conditions to obtain 


S ff (x)(r(x)~S ,, {x))d x 


= S f '{x)(f'{x)-S f (x))\ 


=b Cb 
~ / S’” 
=a Ja 


{x){f , {x)-s'{x))dx 


= 0-0- - S f (x))dx. 


Since S >n (x) — 6s ki on the subiutcrval fr*, xk j-i J, it follows that 

r*k + \ 

I S ,,r (x){f’(x) - S'(x )) dx = 6s k ^(f(x) - 5(jr)) * +l = 0 

JjCi 

fori = 0, 1 ,..., N - 1. Hence /f S’Wif'ix) - S"(x))dx = 0, and it follows tha 


S"(x)f"{x)dx= / (S”(,x)) 2 dx. 


Since 0 < (f“(x) — S"(x)) 2 , we get the integral relationship 


f\f\x)-S"(x)) 2 dx 
Ja 

= / (f"(x)) 2 dx- 2 [ 
Ja Ja 


= 1 (f f (x)) 2 dx-2 I f"(x)S”(x)dx+ {S"(x)) 2 dx. 


Now the result in (24) is substituted into (25) and the result is 


0< / (f ,f (x)) 2 dx- / (S'Wdx. 


This is easily rewritten to obtain the relation (23) and the result is proved. 


The following program constructs a clamped cubic spline interpolant for the data 
points {(**, The coefficients, in descending order, of £*(*), for k = 0, ]. 

,.., Af - 1, are found in the (k - 1 )st row of the output matrix S. In the exercises the 
reader will be asked to modify the program for the other end-point constraints listed in 
Table 5.8 and described in Lemmas 5,2 through 5.5. 
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Program 5.3 (damped Lhibic spline), t o construct and evaluate a clamped cubic 
spline interpolant SO) for the N -f 1 data points {(jc*, y*)}£L<). 

function S=csfit(X,Y,dxO.dxn) 

Tilnput - X is the lsn abscissa vector 

’4 - Y is the lxn ordinate vector 

X - dxO = S’CxO) first derivative boundary condition 

X - dxn = S*(xn) first derivative boundary condition 

KOutput- - S: rows of S are the coefficients, in descending 
X order* for the cubic interpolants 

N*length(X)-l; 

H=diff(X); 

D^diff(Y). /H; 

A-H(2:N-1); 

B=2*(H(1:N-1)+H(2:N)); 

C=H(2:N); 

U=6*diff(D); 

7„Clamped spline endpoint constraints 
B{l)=B(l)~H(l)/2; 

U(l)=U(l)-3*(D(i)-dxO); 

B(W-l)=B(N-l)-H(N)/2; 

U(K-l)=U(N-l)-3*(dxn-D(M)); 

for k=2:M-l 

temp=A(k-i)/B(k-i); 

B GO =B(k)-temp^C(k-1); 

U(k)=U(k)-temp*U(k-l); 


M(N)=U(N-i)/B(N-l); 
for k=N-2:-1:1 

M(k+l)=(U(k)-C(k)*M(k+2))/B(k); 

end 

M(l)=3*(D(l)-dxO)/H(l)-M(2)/2; 
M(N+1)=3*(dxn-D(N))/H(N)-M(N)/2; 


for k=0:N~l 

S(k+1,l)=(M(k+2)-M(k+l))/(6*H(k+l)); 
S(k+l,2)=M(k+l)/2; 

S(k+1,3)=D(k+l)-H(k+l)*(2*M(k+l)+M(k+2))/6; 
S(k+1,4)=Y(k+l) ; 


end 
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Example 5.12. Find the clamped cubic spline that passes through (0,0.0), (1.0 ^i. 
(2, 2.0), and (3, 1.5) with the first derivative boundary conditions S'(0) = 0.2 and 5'(31 - 
- 1 . 

In MATLAB: 

»X=[0 12 3]; Y= [0 0.5 2.0 1.5] ;dx0=0.2; dxn=-l; 

»S=csf it (X,Y, dxO ,dxn) 

S = 

0.4800 -0.1800 0.2000 0 
-1.0400 1,2600 1.2800 0.5000 
0.6800 -1.8600 0.6800 2.0000 

Notice that the rows of S are precisely the coefficients of the cubic spline interpolant- in 
equation (18) in Example 5.7. The foi’owing commands show how to plot the cubic spline 
interpolant using the polyval command. The resulting graph is the same as Figure 5.12 

»xl=0: .01:1; yl=polyval(S(l, :) ,xl-X(l)) ; 

>>x2=l:.01:2; y2=polyval(S(2,0,x2-X(2>); 

»x3=2: .01:3; y3=polyval(S(3,:) ,x3-X(3)); 

»piot (xi ,yi,x2,y2 } x3,y3,X*Y, 1 . a 


Exe rcises for Interpolation by Spline Functions 

1. Consider the polynomial 5(x)=oo + a i- t + °ix 2 4- <33* 3 . 

(a) Show that the conditions 5(1) = 1, 5'(1) = 0,5(2) = 2, and 5 (2) = 0 produce 
the system of equations 

ao + a\ + ai + <33 = 1 

O] + 2a2 + 3fl3 = 0 
ao + 2a\+4a2+ 8^3 — 2 
a\ + 4az + 12a 3 — 0 

(b) Solve the system in part (a) and graph the resulting cubic polynomial 

2. Consider the polynomial S(x ) — oq + ajx + aix 2 + a^x 3 . 

(a) Show that the conditions 5(1) = 3, 5'(1) = —4, 5(2) = 1, and 5 (2) - 2 
produce the system of equations 

«o + + #2 + «3 = 3 

a\ + 2ai + 3fl3 — —4 
do 4- 2«i + + 803 = 1 

a 1 + 4«2 + 12^3 = 2 

(b) Solve the system in part (a) and graph the resulting cubic polynomial. 
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3- Determine which of the following functions are cubic splines. Hint. Which, if any, of 
the five parts of Definition 5.1 does a given function f(x) not satisfy? 
f — ^rX 4- 15x 2 — TJf 3 for 1 < x <2 


(a) /<x) = 


(b) /(*) = 


(c) /(*) = 


(d) /<x) = 


^Z + ^_21x 2 +^jc 3 
11 - 24x + 18x 2 - 4x 3 
—54 + 72x — 30x 2 + 4x 3 


[-70+^fx -4Gx 2 + ^x 3 
f 13 — 31x -t- 23x 2 — 5x 3 
I-35 + 51x - 22x 2 + 3x 3 


for 2 < x < 3 
for 1 < x < 2 


for 2 < x < 3 


ror 1 < x < 2 


ior z < x < 5 


for 1 < x < 2 
for 2 < x < 3 


4. Find the clamped cubic spline that passes through the points (—3, 2), (—2, 0), (1, 3), 
and (4, 1) with the first derivative boundary conditions S'(— 3) = — 1 and 5'(4) = 1. 

5. Find the natural cubic spline that passes through the points (—3, 2), (— 2, 0), (1, 3), 
and (4, 1) with the free boundary conditions S" (—3) = 0 and 5" (4) = 0. 

6. Find the extrapolated cubic spline that passes through the points (—3,2), (—2,0), 
(1,3), and (4,1). 

7. Find the parabolically terminated cubic spline that passes through the points (—3,2), 
(-2,0), (1,3), and (4,1), 

8. Find the curvature-adjusted cubic spline that passes through the points (—3,2), 
(-2, 0), (1,3), and (4, 1) with the second derivative boundary conditions 5"{—3) = 
-1 and 5" (4) = 2, 

9. (a) Find the clamped cubic spline that passes through the points {(x*, /(xt))}^ =0 , 

on the graph of /(x) = x + J, using the nodes x 0 = 1/2, xi = 1, x 2 = 3/2, 
and X3 = 2, Use the first derivative boundary conditions 5 f (xo) = /'(xo) and 
5^x3) = /'(X3). Graph / and the clamped cubic spline interpolant on the same 
coordinate system. 

(b) Find the natural cubic spline that passes through the points ((xjt, / (x*))}£ =0 , on 
the graph of /(x) = x + \ , using the nodes xo — 1/2, x\ ~ 1, X2 = 3/2, and 
*3 = 2. Use the free boundary conditions 5 // (xq) = 0 and 5"(x3) = 0. Graph 
f and the natural cubic spline interpolant on the same coordinate system. 

10, (a) Find the clamped cubic spline that passes through the points {(x*, /(x*)))* =0 > 
on the graph of /(x) = cos(x 2 ), using the nodes xo ~ 0, xi = y/Wpl,X 2 ~ 
,/3jr/2,andx3 = V5 t/ 2. Use the first derivative boundary conditions S' (xo) = 
f(x 0) and 5^x3) = /'(x 3 ). Graph / and the clamped cubic spline interpolant 
on the same coordinate system. 

(b) Find the natural cubic spline that passes through the points {(x*, /(xjt))}J =0 , 
on the graph of /(x) = cos(x 2 ), using the nodes Xo = 0, xj = -Jxfl, x^ = 
J3nj2 y and X3 = s/Snjl. Use the free boundary conditions 5"(xo) = 0 and 
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= 0. Graph / and the natural cubic spline interpolate on the samt 
coordinate system, 

11. Use the substitutions 


r*+i - x ~h k + (x k - x) 


fot+i ~ *) 3 = + jh\\Xk - x) -j- 3ht(x t - x ) 2 + (jc* - jt ) 3 


to show that when equation (8) is expanded into powers of (jc* - jc), the coefficients 
are those given in equations (16). 

12. Consider each cubic function Sk (jc) over the interval ], 

(a) Give a formula for Sjc(x) dx. 

Then evaluate f^ S(x) dx in part (a) of 

(b) Exercise 10 (c) Exercise 11 

13. Show how strategy (i) in Table 5.8 and system (12) are combined to obtain the equa¬ 
tions in Lemma 5.1. 

14. Show how strategy (iii) in Table 5.8 and system (12) are combined to obtain the 
equation in Lemma 5.3. 

15. (a) Using the nodes xo = —2 and x\ = 0. show that f{x) = jc 3 - x is its own 

clamped cubic spline on the interval [—2,0]. 

(b) Using the nodes *o = -2, x\ = 0, and xj = 2, show that f(x) = jt 3 - t is 
its own clamped cubic spline on the interval [-2,2]. Note. / has an inflection 
point atxj. 

(c) Use the results from parts (a) and (b) to show that any third-degree polynomial, 
/(*) = ao + aix + azx 2 + a^x 3 , is its own damped cubic spline on any dosed 
interval [a, b], 

<d) What, if anything, can be said about the other four types of cubic splines de¬ 
scribed in Lemmas 5.2 through 5.5? 


Algorithms and Programs 


1. The distance <4 that a car traveled at time r* is given in the follwoing table. Use 
Program 5.3 with the first derivative boundary conditions S'(0) *= 0 and S'(8) = 98 
and find the damped cubic spline for the points. 


Time, ifr _0 2 4_ 6 8 

Distance. Jj. I 0 I 40 I 160 I 300 480 
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2. Modify Program 5.3 to find the (a) natural, (b) extrapolated, (c) parabolically termi¬ 
nated, or (d) end-point curvature-adjusted cubic splines for a given set of points, 

3. Use your programs from Problem 2 to find the five different cubic splines for the 
points (0, 1), (1,0), (2,0), (3, 1), (4,2), (5,2), and (6, 1), where S'(0) = -0.6, 
$'(6) — —1.8, 5 "{ 0 ) = 1, and S" (6) = — 1. Plot the five cubic splines and the points 
on the same coordinate system. 

4. Use your programs from Problem 2 to find the five different cubic splines for the 
points (0,0), (1,4), (2,8), (3,9), (4,9), (5,8) and (6,6), where S'(0) = i, 
Sf 6) = -2, £"(0) =; 1, and 5"(6) = —1. Plot the five cubic splines and the points 
on the same coordinate system. 

5. The accompanying table gives the hourly temperature readings (Fahrenheit) during 
a 12-hour period in a suburb of Los Angeles. Find the natural cubic spline for the 
data. Graph the natural cubic spline and the data on the same coordinate system. Use 
the natural cubic spline and the results of part (a) of Exercise 12 to approximate the 
average temperature during the 12-hour period. 



6. Approximate the graph of f(x) — x — cos(x 3 ) over the interval [—3,3] using a 
clamped cubic spline. 


HI Fourier Series and Trigonometric Polynomials 

Scientists and engineers often study physical phenomena, such as light and sound, that 
:have a periodic character. They are described by functions fix) that are periodic, 

■ ) £(* + /*) = g(x) for all x. 

The number P is called & period of the function. 

It will suffice to consider functions that have period 2 jt. If g{x) has period P, then 
f(x) = g(Px/2n) will be periodic with period 2n. This is verified by the observation 

( 2 ) /(*+&) «*(£ + !>) =,(£) = /«. 
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y 



Figure 5. i v A continuous function fix) with period 2n. 


y 



Figure 5.18 A piecewise continuous function over [a, b\. 


Henceforth in this section we shall assume that /(jc) is a function that is periodic \\ illi 
period 2 jt, that is 5 

(3) f(x+2n)=f(x) for all x. 

The graph y = fix) is obtained by repeating the portion of the graph in any interval, 
of length 2n, as shown in Figure 5.17. 

Examples of functions with period In are sin (jx) and cosO'jc), where j is an 
integer. This raises the following question: Can a periodic function be represented 
by the sum of terms involving aj cos(jx) and bj sm(jx)2 We will soon see that the 
answer is yes. 

Definition 5.2 (Piecewise Continuous). The function /(*) is said to be piecewise 
continuous on [a, b] if there exist values to, t it ..., t K with a = r 0 < t\ < - 

Ik = b such that f(x) is continuous on each open interval t t i < x < r, for i — 1, . 

.,., K, and /(jt) has left- and right-hand limits at each of the points f/. The situation 
is illustrated in Figure 5.18. 4 


Definition 5.3 (ruuiici Scries). Assume that f (x) is periodic with period 2 n and 
that f{x ) is piecewise continuous on [-jr, it]. The Fourier series S(x ) for f(x) is 


co 


(41 

Six) 

= — + 52(fly cosO\x) +bj sinO'jf)), 

2 7=1 

\\ here the coefficients aj 

and bj are computed with Euler’s formulas; 


1 

<*j = ~ 

f” „ , 

(5i 

j f (x) ca&ij x) dx for j = 0, 1, ... 

and 

J 71 

J—71 

(6> 

1 

b ‘~* 

J f(x)sm(jx)dx for j = 1, 2, .... 


The factor \ in the constant term ao/2 in the Fourier series (4) has been introduced 
for convenience so that a 0 ^ould be obtained from the general formula (5) by setting 
/ — 0. Convergence of the Fourier series is discussed in the next result. 

Theorem 5.5 (Fourier Expansion). Assume that S(x ) is the Fourier series for / (x ) 
over [-n t n]. If /'(*) is piecewise continuous on [-n, n\ and has both a left- and 
right-hand derivative at each point in this interval, then Six) is convergent for all x e 
f-?r, jt]. The relation 

£(*) = fix') 

holds at all points a: e [—jt, tt], where f(x) is continuous. If * = a is a point of 
discontinuity of /, then 

, f(a-) + f(a+) 

S(a) - - - -, 

w here f(a~) and f(a + ) denote the left- and right-hand limits, respectively. With this 
understanding, we obtain the Fourier expansion: 

oc 

fix) = 4 - 5^(0; cosO'jf) + bj sinO'x))- 

2 7=1 

A brief outline of the derivation of formulas (5) and (6) is given at the end df the 
section. 
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Example 5.13. Show that the function fix) = x/2 for -n < x <n. extended periodi¬ 
cally by the equation fix + 2x) = f(x), has the Fourier series representation 

0 ;+1 . ,. y v sin(2x) sin(3x) 

/(*) = JL —?— sm (/*> = sm (*)-— • 

;=] z 23 

Using Euler’s formulas and integration by parts, we get 

1 f” x x sin (jx) cos(jx) i* 

aj = - - cos (jx)dx = _ ■ J + W 2 J = 0 

* J-n 2 2*7 2nj 2 \ —ji 

for j = 1, 2, 3,. ,,, and 

b t =i r £ dno-*)*=-~ jtcos °' x) += L ~ 1)J+1 

n 2 2*/ 2*/ 2 I-3T j 

for J — 1, 2, 3.The coefficient ao is obtained by a separate calculation: 

1 r * * 2 1* 

<*o = - / — t— =0. 

* 7_^- 2 4* J-jt 

These calculations show that all the coefficients of the cosine functions are zero. 1I]C 
graph of f(x) and the partial sums 


5200 = sin(x) — 


sin (2a:) 


c , , . , . sin( 2 x) sin( 3 x) 

S$(x) = sm(jt)--— + —j—, 


Sa (x i = sinl'v'i- 


sin (2 a:) sin (3a: ) sin (4a;) 


are shown in Figure 5.19. B 

We now state some general properties of Fourier series. The proofs are left 
exercises. 

Theorem 5.6 (Cosine Series). Suppose that f(x) is an even function; that is, sup¬ 
pose f(~x) = fix) holds for all at. !f f(x) has period 2n and if f{x) and fix) are 
piecewise continuous, then the Fourier series for fix) involves only cosine terms: 

OO 

( 8 ) fix) = y + 1l a J cos 0'*>. 

i =i 


in 
~ Jo 


f (x) cos(jx) dx for j= 0 , 1 , 


(9) 


JT 
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Figure 5.19 The function fix) = x/2 over [—*, *] and its trigono¬ 
metric approximation 5j(x), S 3 (at) and 54 (a). 


Theorem 5.7 (Sine Series). Suppose that f(x) is an odd function; that is, f(-x) = 
~~f(.x) holds for all x . If f(x) has period In and if fix) and f'(x ) are piecewise 
continuous, then the Fourier series for /( x) involves only the sine terms: 

OC 

o°) f(f = Y] h j sin o*)’ 

j =1 


2 n 

(If) bj — — j f(x) $in(jx)dx for j = 1 , 2 ,_ 

* Jo 

Example 5.14. Show that the function /(jr) = jx| for -n < x < n, extended periodi¬ 
cally by the equation f(x+2n) = fix), has the Fourier cosine representation 

n 4 cos((2j — 1 )jt) 


_ 71 H - if- 

; 2 (2j - l) 2 


n 4 / 

= --cos(at) + 

2 JT \ 


js(3al) ( cos(5a) 

+ ~5 r ~ + ' 


The function fix) is an even function, so we can use Theorem 5.6 and need only to 
compute the coefficients {aj}: 

in 2 * sin (jx) 2 cosO*)|* 

aj — ~ / xcos(;x)^x=- : -+-*— 

n J 0 nj nj 2 lo 


2 cosO *)-2 2 ((-l)'-l) 


xj lo 

for j = 1, 2, 3, .... 
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Since ({—1) ; — 1) — 0 when j is even, the cosine series will involve only the odd terms. 
The odd coefficients have the pattern 


The coefficient do is obtained by the separate calculation 




2 r J * 2 - 

— / xdx — — = JT. 

TT fa 7f 0 


Therefore, we have found the desired coefficients in (12). ■ 

Proof of Euler’s Formulas for Theorem 5.5. The following heuristic argument as¬ 
sumes the existence and convergence of the Fourier series representation. To deter¬ 
mine ao, we can integrate both sides of (7) and get 

J f(x)dx = J 4- cos(;x) + bj sin(jjr)) j dx 

< 13) fflo, r . f* •, 

= — dx + 2_^ a j j cos(jx)dx + 2^bj j sm( jx)dx 


J-* 2 

= xao + 0 + 0 . 


^ J f 

j =l 


Justification for switching the order of integration and summation requires a detailed 
treatment of uniform convergence and can be found in advanced texts. Hence we have 
shown that 

(14) uq — ~ f f CO dx. 

n J-n 

To determine a m> we let m > 0 be a fixed integer, multiply both sides of (7) by 
cos(mjc), and integrate both sides to obtain 


/(*) cos(mx) dx = — I COS (mx) dx+} j 
2 J-* j=l 


OO f n 

Z°jf 

J —TT 


cost/*) cos(/hjr) d. 


OO _ r n 

+ ^J bj j sin(jx)cos(mx)dx. 

Equation (15) can be simplified by using the orthogonal properties of the trigonometric 
functions, which are now stated. The value of the first term on the right hand side 
of(15) is 


ao f* 

2 J-TT 


d 0 sin(m*) * 

cos(mjt) dx =--- = 0 . 

2m -n 
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Hie value of the te*.« involving cos (jx) cos(mjf) is found by using the trigonometric 
identity 

(17) cost/*) cos (mx) = I cos((./ +m)x) + I cos(0' - m)x). 

When j then (17) is used to get 

a; f cos (jx) cos(mx) dx = ~aj f cqs«j + m)x) dx 

(18) J ~ n 1 J ~x 


1 f* 

+ 2 a J J cos((y - m)x) dx = 0 + ( 


When j = m, the value of the integral is 


cost/*) cos(m*) dx = a m n. 


The value of the term on the right side of (15) involving sinf/*) cos(mr) is found 
by using the trigonometric identity 

( 20 ) sin( 7 *) cos(mx) = i sin((j + «)*) + i sin((y - m)x). 

for all values of j and m in ( 20 ), we obtain 


bj ( sm(jx) cos (mx) dx = -bj f sin((y + m)x) dx 
tfl) J -x 2 J-x 

1 , f n 

+ -dj j sin( 0 ‘ - m)x) dx = 0 + 0 = 0 . 

Therefore, using the results of (16), (18), (19), and ( 21 ) in equation (15), we conclude 
that 


-r 

J —TT 


fix) cos (mx) dx , for m = 1 , 2 , .... 


Therefore, Euler’s formula (5) is established. Euler’s formula ( 6 ) is proved 
Similarly. * 


Trigonometric Polynomial Approximation 

Definition 5.4 (Trigonometric Polynomial). A series of the form 

a M 

^ (*) = -r + cost/*) + bj sin (jx)) 

7=i 

i* called a trigonometric polynomial of order M. A 
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Theorem5,8 (Discrete Fourier Series). Suppose that {(jCy, X /)}^ 0 are N +1 points 
where yj = f(xj), and the abscissas are equally spaced: 

(24) X j = -7t + '~~ for j = 0, 1, ..., N. 

If f{x) is periodic with period 2 tt and 2 M <l N, then there exists a trigonometric 
polynomial T M {x) of the form (23) that minimizes the quantity 

JL 

( 25 ) ^J( f(xk)~T M (x k )) z . 

k=\ 


The coefficients aj and bj of this polynomial are computed with the formulas 

2 N 

(26) a j = — ^ f{x k ) cos(yjcjt) for j = 0, 1, ..., M, 
and 

2 N 

(27) bj = — y^f(xic)smUx k ) for 7 = 1 , 2, .... M. 

/v *=1 


Although formulas (26) and (27) are defined with the least-squares procedure, they 
can also be viewed as numerical approximations to the integrals in Euler’s formulas ( 5 ) 
and ( 6 ). Euler’s formulas give the coefficients for the Fourier series of a continuous 
function, whereas formulas (26) and (27) give the trigonometric polynomial coeffi¬ 
cients for curve fitting to data points. The next example uses data points generated by 
the function fix) = x/2 at discrete points. When more points are used, the trigono¬ 
metric polynomial coefficients get closer to the Fourier series coefficients. 


Example 5.15. Use the 12 equally spaced points jr* = — x + kx/ 6 , for k = 1 „ 2 ,,, „, 12 , 
and find the trigonometric polynomial approximation for M = 5 to the 12 data points 
{(**, /{xt))}*li» where /( x) = x/2. Also compare the results when 60 and 360 points 
are used and with the first five terms of the Fourier series expansion for f(x) that is given 
in Example 5.13. 

Since the periodic extension is assumed, at a point of discontinuity, the function value 
fix) must be computed using the formula 

( 28 , f{ * ) -£<*n±n£l_*!2-*t2 = 0 

2 2 

The function f(x) is an odd function; hence the coefficients for the cosine terms are all 
zero (i.e., aj = 0 for all j). The trigonometric polynomial of degree M = 5 involves only 
the sine terms, and when formula (27) is used with (28), we get 

(29) 75 ~ °‘ 9770486sin ( Jf ) “ 0.4534498 sin(2x) + 0.26179938 sin(3x) 

- 0.1511499 sin(4x) + 0.0701489 sin(5x). 



figure 5.20 The trigonometric polynomial T$(x) of degree 
M = 5, based on 12 data points that lie on the line y = x/2. 


Ihble 5.9 Comparison of Trigonometric Polynomial Coefficients for 
Approximations to / (jc) = x/2 over [— x, x] 



Trigonometric polynomial coefficients ; 

Fourier series 
coefficients 

12 points 

60 points 

360 points 


0.97704862 

0.99908598 

0.99997462 

1.0 

h 

-0.45344984 

-0.49817096 

-0.49994923 

-0.5 


0.26179939 

0.33058726 

0.33325718 

0.33333333 

b 4 

-0.15114995 

-0.24633386 

-0.24989845' 

-0.25 

b 5 

0.07014893 

0.19540972 

0.19987306 

0.2 


The graph of 25 (x) is shown in Figure 5.20. 

The coefficients of the fifth-degree trigonometric polynomial change slightly when the 
number of interpolation points increases to 60 and 360. As the number of points increases, 
they get closer to the coefficients of the Fourier series expansion of f(x). The results are 
compared in Table 5.9. ■ 

The following program constructs matrices A and B that contain the coefficients a j 
and hi, respectively, of the trigonometric polynomial (23) of order M. 
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Program 5*4 (Trigonometric Polynomials). To construct the trigonometric poly 
nomial of order M of the form 

M 

P(x) = cos(;jc) + bj sin (jx)) 

2 /=i 

based on the N equally spaced values xk = —* + 2jt k/N, for k = 1,2,. .. , N. The 
cfmEtnsctton is oossible provided that 7M + 1 < N. 


function [A,B]=tpccef f (X t Y,M) 

'/♦Input - X is a vector of equally spaced abscissas in [-pi,pi] 

*/ t - Y is a vector of ordinates 

*/, - M is the degree of the trigonometric polynomial 

^Output - A is a vector containing the coefficients of cos(jx) 

% - B is a vector containing the coefficients of sin(jx) 

N=length(X)-l; 

maxl-fix((N-l)/2); 

if M>maxi 
M=maxl; 

end 

A=zeros(l,M+1) ; 

B=zeros(l,M+l) ; 

Yends=(Y(l)+Y(N+l))/2; 

Y(l)=Yends; 


A(l)=sum(Y); 
for j=l:M 

A(j+l)=cos(j*X)*Y ) ; 
B(j+l)=sin(j*X)*Y J . 

end 

A=2*A/N; 

B=2*B/N; 

A(n=A(l)/2: 


The following short program will evaluate the trigonometric polynomial P(x) 0 f 
order M from Program 5.4 at a particular value of*. 

function z=tp(A,B,x,M) 

z=A(l); 
for j= 1:H 

z®z+A(j+l)*cos(j+x)+B(j+l)*sin(j*x); 


end 
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window will produce a graph analogous to Figure 5.20. 
»x=-pi:, 01 :pi ; 

»y=tp(A,B,x,M); 

>>plot (Xjy.X.Y^o’) 


Exercises for Fourier Series and Trigonometric Polynomials 


In Exercises 1 through 5, find the Fourier series representation of the given function. 
Hint. Follow the procedures outlined in Examples 5.13 and 5.14. Graph each function 
and the partial sums $ 2 (x), S$(x), and 54 (x) of its Fourier series representation on the 
same coordinate system (see Figure 5.19). 


1 . /<*) = 


— I for —7i < * < 0 
1 for 0 < jf < 7i 


2 , /(*) = 


f + x for -jr < * < 0 
f — x for 0 < .r < 7 r 


3. f(x) = 1 ° 

yx tor 0 < x < n 

-71 - X for -X < X < ^r 

5. fix) = x for < x < | 

71 — X fOTj<X<JT 

6 . In Exercise 1, set x — nr/2 and show that 


5. f{x) = 


4. fix) = 


for —7T < * < -j- 


4 1 3 + 5 7 + ‘" * 

7* In Exercise 2, set x = 0 and show that 

— 111 
8 ” 1 + 32 + 5 2 + 72 + '' ‘ ' 

8. Find the Fourier cosine series representation for the periodic function whose defini¬ 
tion on one period is /(*) = x 2 /4 where — jr < x < n. 

9. Suppose that fix) is a periodic function with period 2P; that is, /(* + IP) = fix ) 
for all x. By making an appropriate substitution, show that Euler’s formulas (5) and 
( 6 ) for / are 


a ° = T J f( x } dx 

1 f p 

aj== p J p f^ cos 

1 rP 


dx for j = 1 , 2 , 


bjZ= T / P sin (^“) dx for J = E 2 . 
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In Exercises 10 through 12, use the results of Exercise 9 to find the Fourier series rep 
resentation of the given function. Graph f(x), S 4 U), and Ss(x) on the same coordinat 
system. 


10. f(x) = 


0 

1 


for -2 < x < 0 
for 0 < x < 2 


11 . fix) = 


— 1 for -3 < x < — l 
|x| for — 1 < x < 1 
1 for 1 < x < 3 


12 , /(r) = x 2 + 9 for - 3 < X < 3. 

13. Prove Theorem 5.6. 


14. Prove Theorem 5.7. 


Algorithms and Programs 


1. Use Program 5.4 with N — 12 points and follow Example 5.15 to find the trigone 
metric polynomial of degree M — 5 for the equally spaced points {(x*, /(x^))}^. 
where fix ) is the function in (a) Exercise 1, (b) Exercise 2, (c) Exercise 3, and 
(d) Exercise 4. In each case, produce a graph of fix), Ts(x), and {(xt, /(jr*))}^ 
on the same coordinate system. 

2. Use Program 5.4 to find the coefficients of 7s (x) in Example 5.15 when first 60 and 
then 360 equally spaced points are used. 

3. Modify Program 5.4 so that it will find the trigonometric polynomial of period 2 P ^ 
b — a when the data points are equally spaced over the interval [a , b}. 

4. Use Program 5.4 to find Tf,(x) for (a) f(x) in Exercise 10, using 12 equally spaced 
data points, and (b) f(x) in Exercise 12, using 60 equally spaced data points. In eacl 
case, graph 75 (x) and the data points on the same coordinate system. 

5. The temperature cycle (Fahrenheit) in a suburb of Los Angeles on November 8 ii 
given in Table 5.10. There are 24 data points. 

(a) Find the trigonometric polynomial Tj(x). 

(b) Graph 7V(x) and the 24 data points on the same coordinate system. 

(c) Repeat parts (a) and (b) using temperatures from your locale. 

6. The yearly temperature cycle (Fahrenheit) for Fairbanks, Alaska, is given in Ta 
ble 5.11. There are 13 equally spaced data points, which correspond to a measuremen 
every 28 days. 

(a) Find the trigonometric polynomial 7e (x). 

(b) Graph T&{x) and the 13 data points on the same coordinate system. 
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Table 5.10 Data for Problem 5 


Time, p.m. 

Degrees 

Time, a.m. 

Degrees 

1 

66 

1 

58 

2 

66 

2 

58 

3 

65 

3 

58 

4 

64 

4 

58 

5 

63 

5 

57 

6 

63 

6 

57 

7 

62 

7 

57 

8 

61 

8 

58 

9 

60 

9 

60 

io 

60 

10 

64 

11 

59 

11 

67 

Midnight 

58 

Noon 

68 


Thble 5.11 Data for Problem 6 


Calendar date 

Average degrees 

Jan. 1 

-14 

Jan. 29 

-9 

Feb. 26 

2 

Mar. 26 

15 

Apr 23 

35 

May 21 

52 

June 18 

62 

July 16 

63 

Aug. 13 

58 

Sept. 10 

50 

Oct. 8 

34 

Nov. 5 

12 

Dec. 3 

-5 
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Numerical Differentiation 


(5, —0.3276), (6, —0.2767), and (7, —0.004). The underlying principle is differentia¬ 
tion of an interpolation polynomial. Let us focus our attention on finding J[( 2). The 
interpolation polynomial p 2 (x) = -0.0710 + 0.6982* - 0.1872* 2 passes through the 
three points (1,0,4400), (2, 0.5767), and (3, 0.3391) and is used to obtain J[ (2) ^ 
p' 2 (2) — -0.0505. This quadratic polynomial pt(x) and its tangent line at (2, ii(2)) 
are shown in Figure 6.1(a). If five interpolation points are used, a better approximation 
can be determined. The polynomial P 4 (x) = 0.4986*+0.01l* 2 -0.0813x 3 +0.0116* 4 
passes through (0,0.0000), (1,0.4400), (2,0,5767), (3,0.3391), and (4, -0.0660) 
and is used to obtain J[(2) as p' 4 (2) = —0.0618. The quartic polynomial p 4 (x) and its 
tangent line at (2, Ji(2)) are shown in Figure 6.1(b). The true value for the derivative 
is J[( 2) — -0.0645, and the errors in p 2 {x) and p 4 (x) are -0.0140 and -0.0026, 
respectively. In this chapter we develop the introductory theory needed to investigate 
the accuracy of numerical differentiation. 


Formulas for numerical derivatives are important in developing algorithms for solv¬ 
ing boundary value problems for ordinary differential equations and partial differen¬ 
tial equations (see Chapters 9 and 10). Standard examples of numerical differenti¬ 
ation often use known functions so that the numerical approximation can be com¬ 
pared with the exact answer. For illustration, we use the Bessel function J\ (*), whose 
tabulated values can be found in standard reference books. Eight equally spaced 
points over [0, 7] are (0, 0.0000), (1,0.4400), (2, 0.5767), (3,0.3391), (4, -0.066+ 


y y 



Figure 6.1 (a) The tangent to p 2 (x) at (2, 0.5767) with slope p 2 {l) = -0.0505. 

(b) The tangent to p 4 (x) at (2,0.5767) with slope p' 4 {2) = -0.0618. 


6,1 Approximating The Derivative 


The Limit of the Difference Quotient 

We now turn our attention to the numerical process for approximating the derivative 
of /(*): 


0 ) 


/'(*) = lim 

A-*- 0 


f(x+k)-f(x) 

h 


The method seems straightforward; choose a sequence {At} so that hk -*■ 0 and com¬ 
pute the limit of the sequence: 


a) 


D k = 


/(*+At)-/(x) 

hk 


for k — 1, 2, ..., n, _ 


The reader may notice that we will only compute a finite number of terms D\, D 2 , 

Dp in the sequence (2), and it appears that we should use D {v for our answer. The 

following question is often posed: Why compute D\, D 2 .£>jv—i? Equivalently, 

we could ask: What value should be chosen so that 7J v is a good approximation to 
bre derivative /'(*)? To answer this question, we must look at an example to see why 
there is no simple solution. 

For example, consider the function /(*) = e x and use the step sizes h = 1, 
1/2, and 1/4 to construct the secant lines between the points (0, 1) and (A, /(A)), 
respectively. As A gets small, the secant line approaches the tangent line as shown in 
Figure 6.2. Although Figure 6.2 gives a good visualization of the process described 
in (1), we must make numerical computations with A = 0.00001 to get an acceptable 
numerical answer, and for this value of A the graphs of the tangent line and secant line 
would be indistinguishable. 
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Example 6.1. Let f(x) — e* and x — 1. Compute the difference quotients D * using the 
step sizes ft* = 10 _t for k = 1,2,..., 10. Carry out nine decimal places in all calculations. 

A table of the values /(I + ft*) and (/(1 + ft*) — /(l))/ft* that are used in the 
computation of D* is shown in Table 6.1. ■ 

The largest value ft] =0.1 does not produce a good approximation D\ /'(1), 
because the step size ft \ is too large and the difference quotient is the slope of the secant 
line through two points that are not close enough to each other. When formula ( 2 ) is 
used with a fixed precision of nine decimal places, hg produced the approximation 
Dg = 3 and ftio produced Dio = 0- If ft* is too small, then the computed function 
values f(x + ft*) and fix) are very close together. The difference fix + ft*) - fix) 
can exhibit the problem of loss of significance due to the subtraction of quantities 
that are nearly equal. The value ftio — 10 10 is so small that the stored values of 
f(x 4 - ft jq) and f(x) are the same, and hence the computed difference quotient is zero. 
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In Example 6.1 the mathematical value for the limit is /'(l) 2.718281828. Observe 

feat the value ft 5 = 10 -5 gives the best approximation, D 5 — 2.7183. 

Example 6 .1 shows that it is not easy to find numerically the limit in equation (2). 
The sequence starts to converge to e, and D 5 is the closest; then the terms move away 
from e. In Program 6.1 it is suggested that terms in the sequence {D*} should be 
computed until | D ;v +1 — Djv > \Dn — Dn—i\‘ This is an attempt to determine the best 
approximation before the terms start to move away from the limit. When this criterion 
is applied to Example 6.1, we have 0.0007 = \D& — D$\ > ID 5 — D 4 I = 0.00012; 
hence D 5 is the answer we choose. We now proceed to develop formulas that give a 
reasonable amount of accuracy for larger values of ft. 


The Central-difference Formulas 

If the function fix) can be evaluated at values that lie to the left and right of *, then 
the best two-point formula will involve abscissas that are chosen symmetrically on both 
sides of x. 

Theorem 6,1 (Centered Formula of Order O (ft 2 )). Assume that / e C 3 [a , b] and 
that* -ft,*,* + ft e [a, b). Then 

w, ^ /(* + ft) - fix - ft) 


Furthermore, there exists a number c = c(x) e [a, b] such that 

/(* +ft)-/<-*“ ft) , r 

14) f Of) - -—_. r £tnjnc(/> 

2ft 


r ,, h 2 f 0) (c) _ 

Etmncifi ft) — ^ — 0(k ). 


The term E{f, ft) is called the truncation error. 

Proof. Start with the second-degree Taylor expansions /(*) = Pi(x) + E 2 U), about 
for /(* + ft) and fix - ft): 

, f m ix)h 2 / {3) (ci)ft 3 

(5) fix + ft) = /(*) + f'ix)h + + — -- 


/(* - ft) = fix) - f\x)h + 


/™(x)A 2 /< 3 >fa)fc 3 


( 6 ) 
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/(x + A)-/(x-A) = 2/'(x)A + 


(C/ (3) (Cl) + / (3 >(C 2 ))A 3 


Since / ( 3 ) (x) is continuous, the intermediate value theorem can be used to find a 
value c so that 




- = / lJ '(c). 


This can be substituted into (7) and the terms rearranged to yield 


/'(*) = 


fix + h)- f(x-h) f°Hc)h 2 


The first term on the right side of (9) is the central-difference formula (3), the second 
term is the truncation error, and the proof is complete. • 

Suppose that the value of the third derivative / f 3 ) (c) does not change too rapidly; 
then the truncation error in (4) goes to zero in the same manner as A 2 , which is ex¬ 
pressed by using the notation 0(h 2 ). When computer calculations are used, it is not 
desirable to choose A too small. For this reason it is useful to have a formula for 
approximating fix) that has a truncation error term of the order 0(h 4 ). 

Theorem 6.2 (Centered Formula of Order O ( h 4 ». Assume that f eC 5 [a, b] and 

that x — 2A, x — A, x, x + A, x + 2A e [«, A], Then 

nm f'f y - ~ f(x + 2 h ) + 8/(x + h ) - 8 f{x - h) + f{x - 2ft) 

UUj / W - nh 

Furthermore, there exists a number c = c(x) € [a,b] such that 


r „ , -/(x + 2A) + 8 /(x + A)- 8 /(x-A) + /(x-2A) , ^ ,,, 

/ (x) — -—-h Eauncif, h) 


F h 4 f 5) (c) 4 

£(ranc(/i «) — -- — 0(n ). 

Proof, One way to derive formula (10) is as follows. Start with the difference betv, ecu 
the fourth-degree Taylor expansions fix) — P 4 U) + Ea(x), about x, of fix + h ) and 

nx + h) _ nx _ h) = 2Ax)h+ ^^ + M^ 


( 12 ) 
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men use the step size 2 ft, instead of h, and write down the following approximation: 
(13) f(x + 2k) - f(x - 2h) = 4 f(x)h + + 

Next multiply the terms in equation (12) by 8 and subtract (13) from it. The terms 
involving / (3j (x) will be eliminated and we get 

-/(* + 2 h) + 8 f(x + A) - 8 f(x - h ) + f(x - 2 A) 


= 12 f'(x)h - 


(16/<3(ci) - 64f 5 fc 2 ))h 5 


If f - 1 (x ) has one sign and if its magnitude does not change rapidly, we can find a 
value c that lies in [x - 2A, x + 2A] so that 

(15) 16/ (5) (ci) - 64/ {5 ) (c 2 ) = -48/< 5 >(c). 

After (15) is substituted into (14) and the result is solved for fix), we obtain 
16) ff(x , = (* + AO + 8/(x + h) — 8/(x - h) + f jx - 2k) /CS) ic) h * 

12 h + 30 • 

The first term on the right side of (16) is the central-difference formula (10), and 
fee second term is the truncation error, the theorem is proved. • 


Suppose that f/ < 5 ) (c)| is bounded for c e [, a , A]; then the truncation error in (11) 
goes to zero in the same manner as A 4 , which is expressed with the notation 0(A 4 ). 
Now we can make a comparison of the two formulas ( 3 ) and (10). Suppose that f(x) 
has five continuous derivatives and that |/ { 3 ) (c)| and |/ ( 5 ) (c)j are about the same. 
Then the truncation error for the fourth-order formula ( 10 ) is 0(h 4 ) and will go to 
zero faster than the truncation error 0(h 2 ) for the second-order formula (3). This 
permits the use of a larger step size. 


Example 6.2. Let / (x) = cos(x) 

fa) Use formulas (3) and (10) with step sizes h = 0.1, 0.01,0.001, and 0.0001, and cal 
culate approximations for /'(0.8). Carry nine decimal places in all the calculations, 
(b) Compare with the true value /'(0.8) = - sin(0,8). 
la) Using formula (3) with A = 0.01, we get 


,/, n ^ _ /(0.81) - /(0.79) , 0.689498433 - 0,703845316 

f (0 ' 8) ~-M2-*-M2-* -°' 717344150 

Using formula (10) with A = 0.01, we get 

f(0 g v ^ -/(0.82) + 8/(0.81) - 8/(0.79) + /(0.78) 

0.12 

^ -0.682221207 + 8(0.6894 98433) - 8(0.703845316) + 0.710913538 

042 


/'(0.8) * 


-0,717356108. 
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Table 6.2 

Numerical Differentiation Using Formulas (3) and (10) 


Step 

Approximation by 

Error using 

Approximation by 

Error using 

size 

formula (3) 

formula (3) 

formula ( 10 ) 

formula ( 10 ) 

0.1 

-0,716161095 

-0.001194996 

-0.717353703 

-0.000002389 

0.01 

-0.717344150 

-0.000011941 

-0.717356108 

0.000000017 

0.001 

-0.717356000 

-0.000000091 

—0.717356167 

0,00000007 6 

0.0001 

-0.717360000 

-0.000003909 

-0.717360833 

0.00000474 2 


(b) The error in approximation forformulas (3) and (10) turns out to be -0.000011941 umi 
0.000000017, respectively. In this example, formula (10) gives a better approximation to 
/'(0.8) than formula (3) when h — 0.01. The error analysis will illuminate this example 
and show why this happened . The other calculations are summarized in Table 6.2. ■ 


Error Analysis and Optimum Step Size 

An important topic in the study of numerical differentiation is the effect of the com¬ 
puter’s round-off error. Let us examine the formulas more closely. Assume that a 
computer is used to make numerical computations and that 

/(xq- k) = y_i and J{xq + h) = yj + e\, 

where /(xq — h ) and f(x$ -f h) are approximated by the numerical values y_ i and yj 
and e-\ and e\ are the associated round-off errors, respectively. The following result 
indicates the complex nature of error analysis for numerical differentiation. 

Corollary 6.1(a). Assume that / satisfies the hypotheses of Theorem 6 .i ana use ihe 
computational formula 

(17) /0 co) *Z™ 

The error analysis is explained by the following equations: 

08 ) /W - + £(/, h) 

In 

where 

£(/, h) = Eroundt/j A) + ^trunc(/? ^0 
{19) e\ — e~\ h 2 f^(c) 

= 2h 6 ’ 

where the total error term E(f,h) has a part due to round-off error plus a part due i 
truncation error. 
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Corollary 6.1(b). Assume that / satisfies the hypotheses of Theorem 6.1 and that nu¬ 
merical computations are made. If k_il < c, |<?i| < e, and M = maxac^idl/^ooi), 

then 



When h is small, the portion of (19) involving (^] — e—\)/2h can be relatively 
large. In Example 6.2, when h = 0.0001, this difficulty was encountered. The round¬ 
off errors are 


/(0.8001) = 0.696634970 - 1 - ej where e\ ^ -0.0000000003 

/(0.7999) = 0.696778442 + er_] where e-\ xs 0.0000000005. 

The truncation error term is 


-h 2 f^(c) 


-( 0 . 0001) 2 


2 / sin(0.8) 


0 . 000000001 . 


The error term E(f t h ) in (19) can now be estimated: 




-0.0000000003 - 0.0000000005 

0.0002 


= -0.000004001. 


0.000000001 


Indeed, the computed numerical approximation for the derivative using h = 0.0001 
is found by the calculation 

^ ^ /(0^001) - /(0.7999) _ 0.696634970-0.696778442 
7 ‘ ~ 0.0002 0.0002 
= —0.717360000, 

and a loss of about four significant digits is evident. The error is -0.000003909 and 
this is close to the predicted error, —0.000004001. 

When formula (21) is applied to Example 6.2, we can use the bound |/ < 3 ) (x)| 5 
I sin(xr) | < 1 = M and the value € = 0.5 x 10 -9 for the magnitude of the round¬ 
off error. The optimal value for h is easily calculated: h = ( 1.5 x 10“ 9 /1 ) 1/3 = 
0.001144714. The step size h = 0.001 was closest to the optimal value 0.001144714 
and it gave the best approximation to /'( 0 . 8 ) among the four choices involving for¬ 
mula (3) (see Table 6.2 and Figure 6.3). * 

An error analysis of formula (10) is similar. Assume that a computer is used to 
make numerical computations and that /(jcq + kk) — y* + e±. 
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Error bound 



Figure 6.3 Finding the optimum 
step size h = 0.001144714 when 
formula ( 21 ) is applied to fix) = 
cos(x) in Example 6.2. 
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Error bound 



0.02 0.04 0.06 

Figure 6.4 Finding the optimum step size 
h = 0.022388475 when formula (26) is applied to 
/(*) = cos(jc) in Example 6.2. 


When formula (25) is applied to Example 6.2, we can use the bound |/ ( 5 ) (jt)| < 
I sin(x)| < 1 — M and the value € = 0.5 x 10 -9 for the magnitude of the round¬ 
off error. The optimal value for h is easily calculated: h - (22.5 x 10 “ 9 / 4) !/5 = 
0.022388475. The step size h = 0.01 was closest to the optimal value 0.022388475, 
it gave the best approximation to /'( 0 . 8 ) among the four choices involving for¬ 
mula ( 10 ) (see Table 6.2 and Figure 6 . 4 ). 

We should not end the discussion of Example 6.2 without mentioning that numer¬ 
ical differentiation formulas can be obtained by an alternative derivation. They can 
ho derived by differentiation of an interpolation polynomial. For example, the La- 
mange form of the quadratic polynomial pj(x) that passes through the three points 
[(),■, cos(0.7)), (0.8, cos(0.8)), and (0.9, cos(0.9)) is 

pi(x) - 38.2421094(* - 0.8)(* - 0.9) - 69.6706709(* - 0 . 7 )(* - 0.9) 

+ 31.0804984(* - 0,7)(* - 0.8). 

I polynomial can be expanded to pbtain the usual form: 

Pi(x ) = 1.046875165 - 0.159260044* - 0.348063157* 2 . 

A nilar computation can be used to obtain the quartic polynomial p 4 (x) that passes 
though the points (0.6, cos(0.6)), (0,7, cos(0.7)), (0,8, cos(0.8)), (0.9, cos(0.9)), and 

I I cos(l.O)): 


p 4 (x) = 0.998452927 + 0.009638391* - 0.523291341* 2 
+- 0.026521229* 3 + 0.028981100* 4 . 
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-|,0 J- \ \^''y = cos(x) - 1.0 [ \S>= cos(x) 

(a) (P) 


Figure 6.5 (a) The graph of.y = cos(x) and the interpolating polynomial pijx) usial 
to estimate /'(0.8) p^ 0 - 8 ) = -0.716161095. (b) The graph of y = cos(x') and the 

interpolating polynomial p${x) used to estimate /'{(). 8) ~ p^tO.S) = -0.717353703. 


When these polynomials are differentiated, they produce p r 2 (0, 8) — -0.716161095 
and p^(0.8) = -0.7L7353703, which agree with the values listed under k =0 ! in 
Table 6,2. The graphs of pi(x) and pt(x) and their tangent lines at (0.8, cos(0,8) are 
shown in Figure 6.5(a) and (b), respectively, 

Richardson's Extrapolation 

In this section we emphasize the relationship between formulas (3) and (10). hci 
f k = f{xk) - /(jfo + AA), and use the notation Do(h) and Do(2h) to denot. die 
approximations to f(x 0 ) that are obtained from (3) with step sizes h and 2h, re spec 
lively: 

(27) /'Uo) « D 0 (A) + CA 2 


(28) /(;t 0 ) = Oo(2h) + 4 Ch 2 . 

If we multiply relation (27) by 4 and subtract relation (28) from this product, then the 


terms involving C cancel and the result is 


3/'Cr 0 ) ~ 4Z> 0 (/i) - D 0 (2h) = 


4</i -f-i) h-f-2 


Next solve for f'ixo) in (29) and get 

, 4Da(h) - Do(2h) -h + 8/i - 8/_i + f-2 


The last expression in (30) is the central-difference formula (10). 


r KUAJiVTAi iPiU i rib JJfcKi VA1 i vfc 


Example 63. Let /(j) = cos(jr). Use (27) and (28) with h = 0.01, and show how the 
linear combination (4 ZJ|q{A) — Dq{ 2h))/3 in (30) can be used to obtain the approximation 
to f '(0.8) given in (10). Cany nine decimal places in all the calculations. 

Use (27) and (28) with h ~ 0,01 to get 

n ,u* ~ /(0.81) - /(0.79) _ 0.689498433-0.703845316 
° 0.02 ^ 0.02 
- -0.717344150 


n ^ - /(°- 82 ) ' /(0.78) _ 0.682221207-0.710913538 
01 0.04 ~ 0.04 


'-0.717308275. 


Now the linear combination in (30) is computed: 


/'(0.8) = 


4Do(h) - Dq(2H) _ 4(-0.717344150)- (-0.717308275) 
3 ^ 3 


—0.717356108. 


This is exactly the same as the solution in Example 6.2 that used (10) directly to approxi¬ 
mate /' (0.8). * 

The method of obtaining a formula for /'(.to) of higher order from a formula of 
lower order is called extrapolation . The proof requires that the error term for (3) can 
be expanded in a series containing only even powers of h. We have already seen how 
to use step sizes h and 2k to remove the term involving h 2 . To see how h 4 is removed, 
letZ)i(ft) and D\(2h) denote the approximations to /'(Jto) of order 0(h A ) obtained 


„„ , -/2 + 8/,-8/-l+/-2 . A 4 / lS '(C|)__ 

fl») /(*o) =- — -+-—-= D,(h) + Ch 


02) f\x 0 ) = fi + * h „t f 2 + f 4 + « O, (2h) +16 Ch*. 


Suppose that f i5 Hx) has one sign and does not change too rapidly; then the assump¬ 
tion that / (5 Vi) ^ / (5 ) (c 2 ) can be used to eliminate the terms involving h A in (31) 
and (32), and the result is 

W, , _ 1 6D x (h) - Dx{2h) 


The general pattern for improving calculations is stated in the next result. 
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Theorem 6.3 (Kichardsoirs Extrapolation). Suppose that two approximations of 
order ^(/i 2 *) for f'(x o) are Dt-i(k) and Dk~\ (2A) and that they satisfy 

(34) f{x 0 ) - D k - j (h) + cj h 2k + ^A 2 ** 2 + • ■ ■ 
and 

(35) fix o) = At-j (2A) + ^cjA 2 * + 4 k+] c 2 h 2k+1 4- - - - 
Then an improved approximation has the form 


/'(x 0 ) = 0*(A) + 0(h 


2Ic+2j _ ^ Dk- l(A) — £>A-l(2/i) ^ ^j^2jfc+2^ 


The following program implements the centered formula of order 0(h 2 ), equa 
tion (3), to approximate the derivative of a function at a given point. A sequence of 
approximations {£)*} is generated, where the centered interval for D k+ 1 is one-tenth 
long as the centered interval for Di. The output is a matrix L= [H’ D J EM, where H 
is a vector containing the step sizes, D is a vector containing the approximations to the 
derivative, and E is a vector containing the error bounds, Noie. The function f need -, 
to be input as a string; that is, ’f\ 


Program 6.1 (Differentiation Using Limits). To approximate fix) numerically 
by generating the sequence 


fix) « D k = 


fix + 1 <T fc A) - fix - 10 ~ k h) 




for k = 0, ..., n 


to find the best approximation fix) & D n . 

function [L,n]=difflim(f,x,toler) 

lilnput - f is the function input as a string 

% * i is the differentiation point 

7, - toler is the tolerance for the error 

*/,0utput-L*[H 5 D> EM: 

% K is the vector of step sizes 

'/. D is the vector of approximate derivatives 

7. E is the vector of error bounds 

% - n is the coordinate of the ‘ ‘best approximation^ 


maxl=15; 

h=i; 

H(l)=h; 

D(:L)=(feval(f ,x+h)-feval(f ,x-h))/(2*h); 
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E(l)=0; 

R(1)=0; 
for n=l:2 
h=h/10; 

HCn+l)=h; 

D(n+i)=(f eval(f,x+h)-feval(f,x-h))/(2*h); 

E(n+1)=abs(D Cn+1)-D(n )); 

R(n+i)=2*E(n+i)*(abs(D(a+l)}+abs(D(n)Heps); 
end 
H=2; 

While((E(n)>E(n+i))&(R(n)>toler))&n<maxl 
h=h/10; 

H(n+2)=h; 

D(n+2)=(feval(f,x+h)-feval(f,X-h))/(2*h); 

E(n+2)=abs(D (n+2 ) -D (n+1)); 

R(n+2)=2*E(n+2)*(abs(D(ii+2))+&bs(D(n+l))+eps); 
n=n+i; 
end 

n=length(D)-l; 

L= [IT D J EM; 

Program 6.2 implements Theorem 6.3 (Richardson’s extrapolation). Note that, the 
expression for the elements in row J is algebraically equivalent to formula (36). 


Program 6.2 (Differentiation Using Extrapolation). To approximate fix) nu¬ 
merically by generating a table of approximations D(j, k) for k < j, and using 
fix) as Din, n) as the final answer. The approximations D(;\ A) are stored in a 
I lower-triangular matrix. The first column is 


£>O,0) = 


f( x +2-jh)-fix-2-ih) 
2~i +i h 


and the elements in row j are 


D(i.k) = D(j,k~ 1) + 


z>O\*-i)-0O-i.*-i) 


l < k < j. 


function [D,err,relerr,n]»diffext(f, x,delta,toler) 

'/•Input -f is the function input as a string ; f J 
'4 - delta is the tolerance for the error 

% - toler is the tolerance for the relative error 

'/^Output - D is the matrix of approximate derivatives 

% - err is the error bound 
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'/, - r el err is the relative error bound 

'/ f - n is the coordinate of the f ‘best approximation’ 5 

err=l; 
relerr=l; 
h=l; 

j=i; 

D (1,1) = (f eval(f,x+h)-fevalCf,x-h))/(2*h); 

while relerr>toler & err>delta &j <12 
h=h/2; 

D(j+1,l)=(feval(f,x+h)-feval(f,x-h))/(2*h); 
for k»l:j 

D(j+l,k+l)-D(j+l,k) + CD(j+l,k)-D(j > k))/C(4"k)-l); 

end 

err=abs(D(j+l, j + l)-D(j , j)) ; 

relerr=2*err/(abs(D(j+l,j+l))+abs(D(j,j))+eps); 

end 

[n,n]=size(D); 


Exercises for Approximating The Derivative _ 

1. Let f(x) = sin(jt), where x is measured in radians. 

(a) Calculate approximations to /'(0.8) using formula (3) with k = 0. I s h — (1.01, 
and ft = O.OOL Carry eight or nine decimal places. 

(b) Compare with the value /'(0.8) = cos{0.8). 

(c) Compute bounds for the truncation error (4). Use 

[/< 3) (c)| < cos(0.7) ^0.764842187 

for all cases. 

2. Let /( x) = e*. 

(a) Calculate approximations to /'(2.3) using formula (3) with ft — 0.1, ft = ( 
and h — 0.001. Carry eight or nine decimal places. 

(b) Compare with the value /'(2.3) = e 2,3 . 

(e) Compute bounds for the truncation error (4), Use 

|/< 3 V)| < e 2A 11.02317638 


for all cases. 
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3. Let f(x) ~ sintx), where x is measured in radians. 

(a) Calculate approximations to /'(0.8) using formula (10) with ft = 0.1 and ft = 
0.01, and compare with /'(0,8) = cos(0.8). 

(b) Use the extrapolation formula in (29) to compute the approximations to /'(0,8) 
in part (a). 

(c) Compute bounds for the truncation error ( 1 1 ). Use 

\f (5) (c)\ < cos(0.6) ^ 0.825335615 

for both cases. 

4. Let /(*) = e*. 

(a) Calculate approximations to /'(2.3) using formula (10) with h — 0.1 and ft = 
0.01, and compare with /'(2.3) = e 2 - 3 . 

(b) Use the extrapolation formula in (29) to compute the approximations to /'(2.3) 
in part (a). 

(c) Compute bounds for the truncation error (11). Use 

!/< 5) (c)j < e 2,5 12.18249396 


5. Compare the numerical differentiation formulas (3) and (10). Let f(x) — x 3 and find 
approximations for /'(2). 

(a) Use formula (3) with ft = 0,05. 

(b) Use formula (10) with ft = 0.05. 

(c) Compute bounds for the truncation errors (4) and (11). 

6. (a) Use Taylor’s theorem to show that 


ft 2 /^(c) 

2 


(b) Use part (a) to show that the difference quotient in equation (2) has error of 
order O(h) = -ft/< 2 >(c)/2. 

(c) Why is formula (3) better to use than formula (2)? 


7. Partial differentiation formulas , The partial derivative /, (x, y) of f(x, y) with re¬ 
spect to x is obtained by holding y fixed and differentiating with respect to x. Simi¬ 
larly, f y (x, v) is found by holding x fixed and differentiating with respect to y. For 
mula (3) can be adapted to partial derivatives 


M*,y)= f(x ' y + h) ^ flX - y - h) + 0^. 


(a) Let f(x, y) = xy/(x + y). Calculate approximations to f x ( 2, 3) and /y(2, 3) 
using the formulas in (i) with ft — 0.1, 0.01, and 0.001. Compare with the 
values obtained by differentiating f(x, y) partially. 
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(b) Let z = fix, y) = arclar,(y/x) where z is in radians. Calculate approximations 
to f x ( 3, 4) and /,(3,4) using the formulas in (i) with h = 0.1, 0.0L and 0.001. 
Compare with the values obtained by differentiating fix, y) partially. 

8. Complete the details that show how (33) is obtained from equations (31) and (32) 

9. (a) Show that (21) is the value of h that minimizes the right-hand side of (20). 

(b) Show that (26) is the value of k that minimizes the right-hand side of (25). 

10. The voltage E — E(t) in an electrical circuit obeys the equation E(t) = Lid I jilt \ + 
where R is resistance and L is inductance. Use L = 0.05 and R = 2 and 
values for /(f) in the table following. 


t 

/(0 

1.0 

8.2277 

LI 

7.2428 

1.2 

5.9908 

1.3 

4.5260 

1.4 

2.9122 


(a) Find /'(1.2) by numerical differentiation, and use it to compute £(1.2). 

(b) Compare your answer with /(f) = 10e^ f/1 ° sin(2f). 

11. The distance D = D(f) traveled by an object is given in the table following. 


t 

D(t) 

8.0 

17.453 

9.0 ; 

21.460 

10.0 

25.752 

11.0 

30.301 

12.0 

35.084 


(a) Find the velocity V( 10) by numerical differentiation. 

(b) Compare your answer with D(t) = —70 + 7f 4- 70e -fj/!0 . 

12. Let /(x) be given by the table following. The inherent round-off error has the bound 
k* I < 5 x 10~ 6 . Use the rounded values in your calculations. 


X 

fix) = cos(jc) 

1.100 

0.45360 

1.190 

0,37166 

1.199 

0.36329 

1.200 

0.36236 

1.201 

0.36143 

1.210 

0.35302 

1.300 

0.26750 


(a) Find approximations for /'(1.2) using formula (17) with h = 0.1, h — 0.01, 
and h = 0.001. 

(b) Compare with /'( 1.2) = - sin(1.2) -0.93204. 

(c) Find the total error bound (19) for the three cases in part (a). 

13. Let f(x) be given by the table following. The inherent round-off error has the bound 
I^Jtl <5x 10 -6 . Use the rounded values in your calculations. 


jt 

/(*) = ln(x) 

2.900 

1.06471 

2.990 

1.09527 

2.999 

1.09828 

3.000 

1.09861 

3.001 

1.09895 

3.010 

1.10194 

3.100 

1.13140 


(a) Find approximations for /'(3,0) using formula (17) with h =0.1, h = 0.0L 
and h — 0.001. 

(b) Compare with /'(3.0) = } ^ 0.33333. 

(c) Find the total error bound (19) for the three cases in part (a). 

14. Suppose that a table of the function / (x*) is computed where the values are rounded 
off to three decimal places and the inherent round-off error is 5 x 10" 4 , Also, assume 
that \f m (c)\ < 1-5 and |/ (5) (c)| < 1.5. 

(a) Find the best step size h for formula (17), 

(b) Find the best step size h for formula (22). 

15. Let fix) be given by the table following. The inherent round-off error has the bound 
|(?*[ < 5 x 10 -6 . Use the rounded values in your calculations. 


X 

fix) = cos(x) 

1.000 

0.54030 

1.100 

0.45360 

1.198 

0.36422 

1.199 

0.36329 

1.200 

0.36236 

1.201 

0.36143 

1.202 

0.36049 

1.300 

0.26750 

1.400 

0.16997 


(a) Approximate /'(1.2) using (22) with h — 0.1 and h = 0.001. 

(b) Find the total error bound (24) for the two cases in part (a). 
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16. Let fix) be: given by the table following. The inherent round-off error has the bound 
\e k \ < 5 x 10 -6 . Use the rounded values in your calculations. 


X 

/<*)* ln(x) 

2.800 

1.02962 

2.900 

1.06471 

2,998 

1.09795 

2.999 

1.09828 

3.000 

1.09861 

3.001 

1.09895 

3.002 

1.09928 

3.100 

1.13140 

3.200 

1.16315 


(a) Approximate /'(3.0) using (22) with ft = 0.1 and h — 0.001. 

(b) Find the total error bound (24) for the two cases in part (a), 


Algorithms and Programs 

1. Use Program 6.1 to approximate the derivatives of each of the following functions 
at the given value of x. Approximations should be accurate to 13 decimal places. 
Note. It may be necessary to change the values of maxi and the initial value of h in 
the program. 

(a) f(x) = 60x 45 - 32.x 33 + 233x 5 - 47x 2 - 77; x = 1/^3 

/ / y/5 -|-sin(x)\\ 1 + V5 

(b) fix) = tan ^cos [ - [+x i" ~ Jp * = —— 

(c) fix) = sin(cos(l/x)}; x = 1/V2 j _ - 

(d) f{x) = sin(x 3 - 7x 2 + 6x + 8);x = —-— 

(e) fix) =x x *\x =0.0001 

2. Modify Program 6.1 to implement the centered formula (10) of order O (ft 4 ). Use this 
program to approximate the derivatives of the functions given in Problem 1. Again, 
approximations should be accurate to 13 decimal places. 

3. Use Program 6.2 to approximate the derivatives of the functions given in Problem U 
Again* approximations should be accurate to 13 decimal places. Note. It may he 
necessary to change the initial values of err, relerr, and h. 


Numerical Differentiation Formulas 

More Central-difference Formulas 


The formulas for f(x o) in the preceding section required that the function can be 
computed at abscissas that lie on both sides of x, and they were referred to as central- 
difference formulas. Taylor series can be used to obtain central-difference formulas for 
the higher derivatives. The popular choices are those of order O (ft 2 ) and 0(h A ) and are 
given in Tables 6.3 and 6.4. In these tables we use the convention that j\ = f (xo+kk) 
for k = —3, -2,-1,0, 1,2,3. 

For illustration, we will derive the formula for f"{x) of order 0(h 2 ) in Table 6.3 
Start with the Taylor expansions 


(!) f( x + h) = f(x)+hf'(x) + 


h 2 r\x) , ft 3 / (3) u) 


. + 


h 4 f d \x) 

24 + 


Table 63 Central-difference Formulas of Order 0(h 2 } 


fixo) - 


/1-/-1 
2ft 


/Vn)* /l ~ 2 g + / -‘ 


/ (4 W : 


, /2-/l+2/-j-/-2 

2ft 3 

, h — 4/j -T 6/o — 4/_j + /_2 
ft 4 


Table 6.4 

fix 0 )« 
f ' ix Q ) 


/ C3) (x 0 > te 

/ (4> U 0 ) ^ 


Central-difference Formulas of Onler 0(h 4 ) 


-/2 + S / 1 - S /- 1+/-2 
12ft 

-/2 + 16/ t -3Q/0 + HS/-l-/-2 

12A 2 

~h + 8/2 ~ 13/i + 13/-i - 8/_2 + /_ 3 

8ft 3 

~h + 12/2 - 39/, + 56/o - 39/-! + 12/_ 2 - /_ 3 
6ft 4 
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Adding equations (1) and (2) will eliminate the terms involving the odd derivatives 

/'(*), /<»<*),/ l5 > 

(3) /«x + *, + / (i -*) = 2/ W + ^ + 2^ + -- 


Solving equation (3) for /"(*) yields 


/"(*) = 


f(x+k)-2f(x) + f(x-h) _ 2h 2 fWQc) 
h 2 4! 

2ti*f i6 >(x) 2A 2A “ 2 / (2 * ) (j:) 

6! (2Jt>! ’ 


If the series in (4) is truncated at the fourth derivative, there exists a value t that 
lies in \x — h, x -F h] so that 


fix o) = 


/,-2/o + /_i h 2 f«>(c) 


This gives us the desired formula for approximating f"(x): 
/jC v _ f\ “2/0 + /-1 


Example 6.4. Let f U) = cost*). 

(a) Use formula (6) with h =0.1, 0.01, and 0.001 and find approximations to /'"(() . 8 ) 
Carry nine decimal places in all calculations. 

(b) Compare with the true value /"(0.8) = — cos(Q.S). 

(a) The calculation for h = 0.01 is 


/(0.81) - 2/(0.80) + /(0.79) 

0.0001 

0.689498433 - 2(0.696706709) + 0,703845316 
0.0001 

: -0,696690000, 


(b) The error in this approximation is —0.000016709. The other calculations are summa¬ 
rized in Table 6.5. The error analysis will illuminate this example and show why h = 0.01 
was best. ■ 
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Table 6.5 Numerical Approximations to f"(x) for 

Example 6.4 

Step 

Approximation by 

Error using 

size 

formula (6) 

formula (6) 

A =0.1 

-0.696126300 

-0.0005130409 

A = 0.01 

-0.696690000 

-0.000016709 

A =0.001 

-0.696000000 

—0.000736709 


Error Analysis 

Let f k = yk + where ejt is the error in computing /(**)■■ including noise in mea¬ 
surement and round-off error. Then formula (6) can be written 

, yi-2yo + y-i , L \ 

(7) /<*©) =-P- + E(f,h\. 

The error term E(h t f) for the numerical derivative (7) will have a part due to round¬ 
off error and a part due to truncation error: 


E(f,h) = 


-2e 0 + e-i * 2 /< 4 >(c) 


If it is assumed that each error e k is of the magnitude , with signs that accumulate 
mors, and that |/ (4 K*)I < Af, then we get the following error bound: 

4e Mh 2 

I P / / LU -- I _ 


If h is small, then the contribution 4 e/h 2 due to round-off error is large. When h 
j, large, the contribution Mh 2 /\2 is large. Tie optimum step size will minimize the 
quantity 

4e Mh 2 

d) s ( * ) = P + lT' 

Setting g'{fO = 0 results in -8c/ft 3 + Mhf 6 = 0, which yields die equation 
h : - 48e f M, from which we obtain the optimal value: 

/4feV' 4 

mo " = (irj 


tt Hen formula (11) is applied to Example 6.4, use the bound l/ {4) (*)l 5 Jcos(jr)j £ 
1 = AT and the value € = 0.5 x 10 -9 . The optimal step size is h = (24 x 10 y /1) V 4 = 
0.01244666. and we see that/i = 0.01 was closest to the optimal value. 
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Since the portion of the error due to round off is inversely proportional to the square 
of h, this term grows when h gets small. This is sometimes referred to as die step-size 
dilemma. One partial solution to this problem is to use a formula of higher order so 
that a larger value of h will produce the desired accuracy. The formula for /"(jco) of 
order O ( k 4 ) in Table 6.4 is 


( 12 ) 


/"(*o) = 


-/z+16/1-30/0+16/— x -f-2 
I2h 2 


+ E(f t h). 


The error term for (12) has the form 


(13) 


EU ' h) = W + ~ 90“'' 


where c lies in the interval [a: — 2h, x + 2k). A bound for \ E(f, h) \ is 
(14) 


16* ft 4 A/ 

\E{f,h)\ < — T + - 


3 h 2 90 ’ 

where |/ f 6 *(x)| < M. The optimal value for h is given by the formula 


(15) 


h = 



1/6 


Example 6.5. Lei: / (x) — cos (a). 

(a) Use formula (12) with ft = 1.0,0.1, and 0.01 and find approximations to /"(0.8). 
Carry nine decimal places in all the calculations. 

(b) Compare with the true value /"(0.8) — - eos(O.S). 

(c) Determine the optimal step size. 

(a) The calculation for ft = 0.1 is 


/"( 0 . 8 ) 

^ -/(1.0) + 16/(0.9) - 30/(0.8) + 16/(0.7) - /(0.6) 

0.12 

_ -0.540302306+9.945759488-20.90120127+ 12.23747499- 0.825335615 

0.12 


—0.696705958. 


(b) The error in this approximation is —0.000000751. The other calculations are summa¬ 
rized in Table 6 6 . 

(c) When formula (15) is applied, we can use the bound |/^(x)| < | cosOOJ < l = M and 

the valuer = 0.5 x 10 -9 . These values give the optimal step size ft = (120 x 10“ 9 /1)^ 6 = 
0.070231219 ■ 


Table 6.6 Numerical Approximations to /"(x) for 
Example 6.5 


Step 

Approximation by 

| Error using 

size 

formula (12) 

formula (12) 

A = 1 0 

-0.689625413 

-0.007081296 

ft =0.1 

-0.696705958 1 

-0.000000751 

ft =0.01 

—0.696690000 

-0.000016709 


Table 6.7 Forward- and B ack ward -d i ffe ren ce Formulas of 
Order 0(h 2 ) 


ruo)« 

/'Uo) « 


-3/p + 4/] - fi 
2h 

3/o ~ 4/_i + A2 
2 A 




2/o - 5/, + 4/2 - h 
h 2 




2/0-5/^!+4/_ 2 -/-3 
h 2 


( forward \ 
difference / 

( backward ^ 
V difference / 

/ forward \ 
\ difference / 

( backward \ 
difference / 


,( 4) , . 3/n - H/i + 26 h - M /3 + 1 1/4 - 3/s 


/( 4 W « 3/0 - 14/ -‘ + * f ~ 2 + n/ - 4 ~ 2/ - j 


Generally, if numerical differentiation is performed, only about half the accuracy 
of which the computer is capable is obtained. This severe loss of significant digits will 
almost always occur unless we are fortunate to find a step size that is optimal. Hence 
we must always proceed with caution when numerical differentiation is performed. 
The difficulties are more pronounced when working with experimental data, where 
die function values have been rounded to only a few digits. If a numerical derivative 
Must be obtained from data, we should consider curve fitting, by using least-squares 
techniques, and differentiate the formula for the curve. 
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Differentiation of the Lagrange Polynomial 

ff the function must be evaluated at abscissas that lie on one side of x 0 , the centra - 
difference formulas cannot be used. Formulas for equally spaced abscissas that lie t > 
the right (or left) of ,t n are called forward (or backward) difference formulas. Thes. 
formulas can be derived by differentiation of the Lagrange interpolation polynomial. 
Some of the common forward- and backward-difference formulas are given in Tc 
ble 6.7. 

Example 6 . 6 . Derive the formula 

/»<*„) * M-M+4A-A 

h l 

Siait with the Lagrange interpolation polynomial for /(?) based on the four points xq. 
xi, x 2 „ and *3. 

m * /q ('-*i)(r-*2)(f-*3) (f-jco)(f-ja)(f-x 3 ) 

(x 0 - xi)(x 0 - x 2 )(x 0 - x 3 ) 7 (xi - jc 0 ><JCi - x 2 )(xi - * 3 ) 

+ ft — JCQ)C/ — JC1K/ — JC3) + (<-JT 0 )(f -Jfi)(f-X 2 ) 

" (X2 - X 0 )(X2 - X\)(xi - x 3 ) n (x 3 - X 0 )(JC 3 - xi)(x 3 - X 2 )' 
Differentiate the products in the numerators twice and get 

/"(,) ^ f a 2((f ~ * 1 ) + <t ~ xz) + (1 - x 3 )) + 2 ((r-jc 0 ) + <f -x 2 ) + (r-x 3 )) 

— *1 )(X0 — X 2 )(x 0 - X 3 ) 1 (jtj - jto)(JC] - X 2 )(X| - X 3 ) 

| ^ 2 ((r-x 0 ) + (r-xi) + (f-x 3 )) f 2 ((f-x 0 ) + (y — jn) + (i - x 2 )) 
(x 2 - x 0 )(x 2 - XI) (X 2 - x 3 ) 3 (x 3 - xo){x 3 - X] )(X 3 - X2) ■ 

Then substitution oft - xq and the fact that x, - xj = (i - j)h produces 

\ _ r 2 "xo - *\ > + (XQ - x 2 ) 4- C * 0 - x 3 )) 

/ (X 0 ) — Jo -—“-r~-—-■- 

(*0 - X])(X 0 - X 2 )(X 0 ~ X 3 ) 

+ ^ 2 ((xq - xq) + (xq - x 2 ) + Up - x 3 )) 

(X| - X0,)(X] - x 2 )(xi - X3) 

+ 2((x ° ~ + to ~ J l) + txp - x 3 )> 

(X2 ~ Xo)(X 2 - X])(X2 “ X 3 ) 

+ ^ 2((xq - xq) + (Xq - xi) + (xq - x 2 )) 

(x 3 - Xn)(x 3 - XI )(x 3 - x 2 ) 

- 2«-*> + <-2*) + C—3/i)3 . , 2((0) + (-2A) + (-36)) 

<-A)C—2A)C-3*J + 7 ‘ {h){-k)(^2h) 

_l f 2((0) + (-A ) + (“3/*)) , , 2((0) + (—A) + (—2A)) 

J (2A)(A)(—A) ±n (3A)(2A)(A) 

- ft,—— + A j. a , , ~ 6h _ 2/0 ~ 5/i + 4/2 - 


and the formula is established. 


Example 6.7. Derive the formula 

_ -5/0 + I 8/1 - 24/ ;! + 14/3 - 3/4 
/ (xo)--' 


Start with the Lagrange interpolation polynomial for /(f) based on the five points xo. 
X] * x 2 , x 3 , and X 4 . 


/(f) fo~ 


(f -X|)(f -x 2 )(r -X 3 )(f -x 4 ) 


■ * (x 0 - xiHxo - X 2 )(x 0 - X 3 )(x 0 - X 4 ) 

+ y._ (f - Xn)(f - X 2 )(f - xj\(l - xa) _ 

J (xi - X 0 )(X] - X 2 )(X1 - x 3 )(xi - x 4 ) 

+ ^_ (f - xp)(f - xi)(f - x 3 )(f - x 4 ) _ 

J (xi - Xo)(X2 - Xi)(x 2 - x 3 )(x 2 - X 4 ) 

(r -xo)(f -Xj)(r -x 2 )(f -x 4 ) 

T /3 - 

(X 3 “ Xo)(x 3 - Xi)(x 3 - X2)(x 3 - X 4 ) 

jr 4 _ (f - Xq)( f - Xj)(f - X 2 )(f - x 3 ) _ 

(x 4 -- X0)(X 4 - Xl)(X4 - x 2 )(x 4 - x 3 ) 

Differentiate the numerators three times, then use the substitution x; — xj = (i - j)h in the 
denominators and get 

^ f ~ XI) + (f - X 2 ) + (f - X3) + (f - X 4 )) 

f ( °~ /0 (Lftj?-2A)(-3A)(-4ft)- 

+ f 6 ^f - xo) + (t - x 2 ) + (t - X 3 ) + (f - x 4 )) 

71 (A)(—6)(—2A)(—3A) 

6((f - xq) + <f - xt) + (f - x 3 ) + (f - x 4 )) 

72 (2A)(A)(—A)(2A) 

6 ((f — x Q ) + (f — XT) H- (i - x 2 ) + (i - J 4 )) 

73 (3A)(2A)(A)(—A) 

_l_ f ~ + ft ~ xi) + (t - x 2 ) + (f - x 3 )) 

+ 74 (4A)(3A)(2A)(A) 

Then substitution of t ~ xo in the form t — Xj ~ xq — xj — —jh produces 


^ ^ . 6((-A) + (-2A) + (-3A) + (-4A)) , , 6((0) + (-2A) + (-3A) + (-4A)) 
/ -- —+fi - — - 

. , 6«0) + (—A) + (-3A) + (-4A)) . , 6((0) + (-A) + (-2A) + (-4A)) 

+ n -4j?- + h - Za? - 

6((0) + (—A) + (—2A) + (—3A» 

+ /4 - 24h~* -’ 

_ -60A 54A ^8A 42ft -36A 

/o 24A 4 + J1 6A 4 + fl 4A 4 + ^ 6A 4 + ^ 24A 4 
-5/q + 18/] - 24/2 + 14/3 - 3/4 
2A3 

and the formula is established. ■ 
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Differentiation of the Newton Polynomial 

In this section we show the relationship between the three formulas of order 0(h 2 ) for 
approximating f'(x o), and a general algorithm is given for computing the numeri. L .i 
derivative. In Section 4,3 we saw that the Newton polynomial P(t) of degree N - % 
that approximates fit) using the nodes to, h, and t 2 is 

(16) Pit) = a 0 -hai(t - t Q )+a 2 it - f 0 )(* - t\), 

where a 0 = /(r 0 ), a { = </(f f ) - /(f 0 ))/(*i - to), and 

fU2)~f{t i) m)-f(to) 

- - 5 ~ f * - '‘-’o 

(17 ~ t 0 ) 


The derivative of P(t) is 


( |7 ) P'0) = O] +a 2 ((t - to) + (t - f])), 

and when it is evaluated at t = tn, the result is 

OS) P'(Jq) = ai + a 2 (tc - t\) *=» fit o). 

Observe that the nodes {^] do not need to be equally spaced for formulas (16) 
through (18) to hold. Choosing the abscissas in different orders; will produce difference 
formulas for approximating fix). 

Case (i): If to =x t ti = x + k, and t 2 = x + 2h, then 
f{x+h)-fix) 

B, “- 1 -• 

/(x)-2/{jt + « + /(* + 2A) 


When these values are substituted into (18), we get 

p\ x) = /(*+*)-/(*) -f{x)+2f(x+h)~ f{x + 2h) 

h 2 h 

This is simplified to obtain 

(19) P’(x) = -3/W + 4/(x + A)-/(x + 2A) ^ 

2 h 

which is the second-order forward-difference formula for fix). 

Case (ii): If to = x, ri = x + h, and t 2 = x — h, then 

. fix ^h)-fix) 

- h -’ 

f(x+h)~2f(x) + f(x-h) 

a2 = -2*5- 
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W hen these values are substituted into (18), we get 

A, f , f(x + h)- f(x) -f(x + h)+ 2 f(x) - fix - h) 

W “ h + 2 h 

This is simplified to obtain 

■:<* pt M= fO + » 

v. 'rich is the second-order central-difference formula for fix). 

Case (iii): If to = h = x — h, and t 2 = x — 2h, then 

fix) - fix - h) 


fix) - 2 fix -h) + fix - 2 h ) 

02 =- V? -• 

Fhese values are substituted into (18) and simplified to get 

v.hich is the second-order backward-different formula for fix). 

The Newton polynomial P(t) of degree N that approximates fit) using the nodes 

A,, ri, ■. ts is 

,, ^ P(t) =■■ a 0 + flj(r - to) + * 2 (f - to)it - h) 

> -I- a 3 it - fo)(f - t\){t - t 2 ) H-h an{t - to) ■ ■ ■ (f - riv-i). 

The derivative of Pit) is 

P’it) = a! + a 2 ((t - to) + it- H)) 

+ «3((* “ t 0 )(t - ti) + it - to)(t - t 2 ) + (t- t])(t - t 2 )) 

) A r —i n-i 

+ = * = + a N j - [ Cf -tj). 

k=0 }=o 

m 

When P f (t) is evaluated at t = to , several of the terms in the summation are zero, 
and P'ito) has the simpler form 

( P'ito) —a\ +Ct 2 ito - t\) +^3(^0 ri)(*0 — t 2 ) -\ - 

+ OtfOo — *l)( f 0 — t2)(t0 — *3) ■ ‘ ■ (to — 
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The £th partial sum on the right side of equation (24) is the derivative of the Newtoi 
polynomial of degree k based on the first k nodes. If 

■ Ifo - t\ | < !* 0 - t 2 \ < ■ " < |ro - *n\, and {(*/. °))JLo 

forms a set of N + 1 equally spaced points on the real axis, the fcth partial sum is an 
approximation to f'(to) of order 0 (h k ~ l ). 

Suppose that N = 5. If the five nodes are r* — x + hk for fc = 0, 1, 2, 3, and 4. 
then (24) is an equivalent way to compute the forward-difference formula for f{x) o“ 
order 0(h 4 ), If the five nodes {f*} are chosen to be to = x, r L — x + A, t 2 = x - h. 
t 3 = x + 2h, and (4 — x — 2h, then (24) is the central-difference formula for f\x) oJ 
order O (h 4 ). When the five nodes are — x — kh , then (24) is the backward-difference 
formula for f'{x) of order Of/? 4 ). 

The following program is an extension of Program 4.2 and can be used to imple¬ 
ment formula (24). Note that the nodes do not need to be equally spaced. Also, it 
computes the derivative at only one point /'(xo)- 

Program 6.3 (Differentiation Based on N -f 1 Nodes). To approximate /'(x) 
numerically by constructing the Nth-degree Newton polynomial 

P(x) =a 0 + <2i(* - * 0 ) +a 2 (x - xq)(x -*i) 

-M 3 (x -xo)(x -xi)(x - jt 2 ) H-HfljvU - x 0 ) • • *(x - x N -i) 

and using /'(xo) ^ ("(xii) as the final answer. The method must be used at xq. 
The points can be rearranged {Xjt, xq, ..., x*-i, xjt+ii ..., x N } to compute /'(**) & 

p’(x k y _ 

function [A,df]=diffnew(X,Y) 

‘/.Input - X is the Ixn abscissa vector 

% - Y is the Ixn ordinate vector 

‘/.Output - A is the lxn vector containing the coefficients of 

the Nth-degree Newton polynomial 
’/, - df is the approximate derivative 

A=Y; 

N=length(X); 
for j=2:H 

for k=N:-i:j 

A(k)=(A(k)-A(k-l))/(X(k)-X(k-j+l)); 

end 

end 


xO=X(l); 
df=A(2); 
prod=l; 


Sec. 6.2 Numerical Differentiation Formulas 


339 


nl=length(A)-l; 
for k=2:nl 

prod=prod*(xO-X(k)); 
df*df+prod*A(k+l); 

end 


Exercises for Numerical Differentiation Formulas 

1* Let/(x) = ln(x) and carry eight or nine decimal places. 

(a) Use formula (6) with h ~ 0.05 to approximate /"(5). 

(b) Use formula (6) with h = 0,01 to approximate /"(5). 

(c) Use formula (12) with h - 0.1 to approximate f" (5). 

(d) Which answer, (a), (b), or (c), is most accurate? 

2. Let/(x) = cos(x) and carry eight or nine decimal places, 

(a) Use formula (6) with h = 0.05 to approximate /"(l). 

(b) Use formula (6) with h = 0,01 to approximate /"(1). 

(c) Use formula (12) with h = 0,1 to approximate / w (l). 

(d) Which answer, (a), (b), or (c), is most accurate? 

3. Consider the table for f{x) = ln(x) rounded to four decimal places. 



(a) Use formula (6) with h = 0.05 to approximate /"(5). 

(b) Use formula (6) with k = 0.01 to approximate /"(5). 

(c) Use formula (12) with h = 0.05 to approximate /"(5) 

(d) Which answer, (a), (b), or (c), is most accurate? 

4. Consider the table for / (x) = cos(jc) rounded to four decimal plates 
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(a) Use Formula (6) with h — 0.05 to approximate /"(1), 

(b) Use formula (6) with h — 0.01 to approximate /"(l). 

(c) Use formula (12) with A = 0.05 to approximate /"(1). 

(d) Which answer, (a), (b), or (c), is most accurate? 

5. Use the numerical differentiation formula (6) and A = 0.01 to approximate /"(l) for 
the functions 

(a) f(x)=x 2 (b) f(x)=x 4 


the functions 
(a) f(x)=x A 


(b) f(x) = x 6 


7. Use the Taylor expansions for f(x + A), f(x ~ h ), f(x + 2A), and f(x — 2 A) anu 
derive the central-difference formula: 

sd), , _ fix + 2k) - 2 f(x + ft) + 2 f{x - h) - fix - 2 h) 

1 {) ^ 2ft 3 

8. Use the Taylor expansions for /(x 4- h), f(x - A), fix + 2 A), and f(x — 2h) and 

H\/P fhp frvrmnln* 


(4 . _ fix + 2/Q - 4 fix + h) + 6 f(x) - 4 fix -k) +fix- 2k) 


f w (x)*^ 


9. Find the approximations to fixf) of order Oih 2 ) at each of the four points in the 
tables. 

(*) --(b) __ 

x f{x) X fix ) 


0.0 0.989992 

0,1 0.999135 

0.2 0.998295 

0.3 0.987480 


0.0 0.141120 

0.1 0.041581 

0.2 —0.058374 

0.3 -0.157746 


10. Use the approximations 




'H) 


2) h 


and derive the approximation 


fi-2jb + f-t 


11. Use formulas (16) through (18) and derive a formula for fix) based on the absciss; 
to = x, = x + ft, and li = x + 3ft. 

12. Use formulas (16) through (18) and derive a formula for fix) based on the abscissae 
to = x, = x — A, and ti = x + 2h. 
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13. The numerical solution of a certain differential equation requires an approximation to 
f n ix) + fix) of order £>(A 2 ). 

(a) Find the central-difference formula for fix) + fix) by adding the formulas 
for fix) and f(x) of order 0(h 2 ). 

(b) Find the forward-difference formula for fix) 4- fix) by adding the formulas 
for fix) and fix) of order Oih 2 ). 

(c) What would happen if a formula for fix) of order Oih 4 ) were added to a 
formula for fix) of order Oih 2 )? 

14. Critique the following argument. Taylor’s formula can be used to get the representa¬ 
tions 


f{x+h) = f(x)+hf(x} + 


h 2 fix) h 3 f\c) 



Adding these quantities results in 

/(* + *) + fix - h ) = 2 fix) + h 2 f"(x). 


which can be solved to obtain an exact formula for f"(x): 


fix) == 


fix + k)-2f{x) + fix-h) 
h 2 


Al gorithms and Programs _ 

1. Modify Program 6.3 so that it will calculate P'ixy) for M — 1, 2,.,,, N + 1. 
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Numerical Integration 


Numerical integration is a primary tool used by engineers and scientists to obtain ap¬ 
proximate answers for definite integrals that cannot be solved analytically. In the are 
of statistical thermodynamics, the Debye model for calculating the heat capacity of 
solid involves the following function; 

f x r 3 J 

*(*)=/ -r~^ dt 

Jo e ! ~ 1 

Since there is no analytic expression for <p{x), numerical integration must be used to 
obtain approximate values. For example, the value <3>(5) is the area under the curve 



Table 7.x Values of <!><*> 


X 

$(*) 

1.0 

0.2248052 

2.0 

1.1763426 

3.0 

2.5522185 

4.0 

3.8770542 

5.0 

4.8998922 

6.0 

5.5858554 

7.0 

6.0031690 

8.0 

6.2396238 

9.0 

6.3665739 


10.0 I 6.4319219 


y — f(t) ~ ^ 1) f° r 0 < t < 5 (see Figure 7.IJ. The numerical approximation 
<D{5) is 

,5 ,3 

<D(5) = / — dt % 4.8998922. 

Jo - l 

Each additional value of <Hr) must be determined by another numeric integration 
Table 7.1 lists several of these approximations over the interval [1, 10]. 

The purpose of this chapter is to develop the basic principles of numerical inte¬ 
gration. In Chapter 9, numerical integration formulas are: used to derive the predictor- 
corrector methods for solving differential equations. 


T 4 1 Introduction to Quadrature 

We now approach the subject of numerical integration. The goal is to approximate the 
definite integral of f(x) over the interval [a, b ] by evaluating f(x) at a finite number 
of sample points. 

Definition 7.1. Suppose that a = jc 0 < *i < - ■. < x M = b. A formula of the form 

M 

(0 £?[/] = ^ Wkf(Xk) — U> 0 /(*0) + U>l/(*l) + ■ ■ ■ + WMf(XAf) 

k =0 

With the property that 

f b 

U> f fWdx=Qlf] + E[f] 
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is called a numerical integration or quadrature formula. The term E[f] is called the 
truncation error for integration. The values are called the quadrature nodes 

and {u>* are called the weights. i 


Depending on the application, the nodes {x*| are chosen in various ways. For the 
trapezoidal rule, Simpson’s rule, and Boole’s rule, the nodes are chosen to be equall; 
spaced. For Gauss-Legendre quadrature, the nodes are chosen to be zeros of certaii 


Legendre polynomials. When the integration formula is used to develop a predicts 
formula for differential equations, all the nodes are chosen less than b. For all applica 

tinnc it it rippPciJuni frn lrnrtiu c.nrvijitl'iirttT -rsKriiii' QAAiifinu nf fUa 


.ixi.t£ auvin at\,L tia^y ut Lilt UUIlltilC-cU MJIUUUU. 


Definition 7.2. The degree of precision of a quadrature formula is the positive inte¬ 
ger n such that £[£,-] = 0 for all polynomials Pt(x) of degree i < n , but for which 
E[P„+\ ] 7 ^ 0 for some polynomial P n+l (x) of degree n 4 - 1. a 

The form of £[A] can he anticipated by studying what happens when f(x) is a 
polynomial. Consider the arbitrary polynomial 

P,(x) = a { x l + ai-]x'~ l + ■ ■ • 4 a\x + a 0 

of degree i. If i < n, then Pf n+l} {x) = 0 for all x, and - (n 4- for 

all x . Thus it is not surprising that the general form for the truncation error term is 

(3) E[f] = Kf n+l Hc), 

where K is a suitably chosen constant and n is the degree of precision. The proof of 
this genera] result can be found in advanced books on numerical integration. 

The derivation of quadrature formulas is sometimes based on polynomial interpo¬ 
lation. Recall that there exists a unique polynomial Pm(x) of degree < M passing 
through the M + 1 equally spaced points {(x*, yk)}jf =0 . When this polynomial is used 
to approximate /(x) over [a, £>], and then the integral of f(x ) is approximated by the 
integral of Pm(x), the resulting formula is called a Newton-Cotes quadrature formula 
(see Figure 7,2). When the sample points xq = a and xm =- b are used, it is called a 
closed Newton-Cotes formula. The next result gives the formulas when approximating 
polynomials of degree M — 1,2, 3, and 4 are used. 


Theorem 7.1 (Closed Newton-Cotes Quadrature Formula). Assume that x* = 
x 0 -F kh are equally spaced nodes and f k = f(x k ). The first four closed Newton-Cotes 
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fu?U r _ e7 ' 2 , (a) , The nilc integrates y = P } ( x ) over [x 0 ,xi] = NXO.O.Sl 

™ JlulpS0n s T mte * nl “ y = p 2 {x) over [x 0 , x,J = [0.0, 1.0J. (c) Simpson’s } rule 

^ = [0a L5] - cd)Boo]es ^y - c 


quadrature formulas are 


(4) 

r 

j (x) dx - 

h 

^ -L ft ^ 




JXQ 


2 w u ■ j w 

tun; trapezoidal rule 

(5) 

/: 

f{x)dx '' 

B j(/o + 4/, +fi) 

(Simpson’s 

rule), 

(6) 

r 

« / -t0 

f(x)dx s 

3 h 

“ + 3/2 + fj) 

(Simpson’s 

3 , 

8 rule) - 

(7) 

r 

JXQ 

f(x)dx* 

2 h 

; Ts Oh + 32/i 4- 12/2 4 32/3 4 7 f 4 ) 



(BooIp.’k ni!. a ! 


7, * r ^ewton-Cote;* Precision). Assume that /(x) is sufficiently differen- 

£ N f wton - Cc>tes quadrature involves an appropriate higher deriva¬ 
tive, The trapezoidal rule has degree of precision n = L If f e C 2 [a , b], then 


r 1 h h 3 

/ H x )d x — -(fa 4 - /]) - — f u \c). 

J xq X. 12 


( 8 ) 
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Simpson’s rule has degree of precision n = 3. If / € C^[a, b ], then 

(9) j* mdx = *(/o + 4/i + f 2 ) - ^/ 14, (C). 

Simpson’s | rule has degree of precision n — 3. If / e C 4 \[a, b \, then 

3 h 3 

( 10 ) ^ /U)^ = T (/o + 3/,+3 / 2 + /3)-—/< 4 >(c). 

Boole’s rule has degree of precision n = 5. If / e C 6 [a, 61, then 

(11) )C /<x) dx = ?|(7/o + 32/, + 12 / 2 + 32/j + 7/,) - |^/ ( 6 , (c). 

Proof of Theorem 7.1. Start with the Lagrange polynomial Pm(x) based on xo. 
that can be used to approximate f{x)\ 

M 

( 12 ) fix) % Pm{x) = ^ fkLfA t k(x)> 


where fk = / Ut) for k = 0, 1, ..., M. An approximation for the integral is Ob* 
tained by replacing the integrand /(jc) with the polynomial P M (x ). This is the genewtf 
method for obtaining a Newton-Cotes integration formula: 


f x M 

I f{x)dx I P.wix)'. 
Jx(\ J JfQ 


H (i;/.l W w) dx = H f M AUtW*) 

Jxo \ifc=0 / k= 0 ' Jx 0 ? 


M / f XM \ 

= V ( / LM,k(x)dx\fk = 2^ t Wkfk. 

k = G \Jx 0 ’ *=0 


The details for the general proof of (13) are tedious. We shall give a sample ppeof; 
of Simpson’s rule, which is title case M = 2. This case involves the approximation 
polynomial 


(14) 

, tx-jrij(x ~x 2 ) . (.x - xq)(j: - x 2 ) . (* - jcq)(jt - xQ 

2 X ~ ^ (jfo - x 1 ) (*0 - X2 ) (x t - Xq) (Xl - x 2 ) Jl ix 2 - *o) 0*2 ~ 
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Since /o, /i, and fi are constants with respect to integration, the relations in (13) lead 


f f{x)dx&f 0 f 

J X 0 Jx< 


pjr-nKr-^^ U-x 0 )(x-x 2 ) , 

*■> (jto — *] )(jqo — Jr 2 ) h (Xl -Xo)(x, -x 2 ) ^ 

J 2 f X2 b-xoHx-xi) dx 

J XQ (Xl - XQ,){X2 - X\) X ' 


We introduce the change of variable -r = .co + At with dx = hdt to assist with 
the equation of the integrals in (15). The new limits of integration am from , = 0 to 
t= 2 . Hie equal spacing of the nodes x* = x 0 + kh leads tox t -x,=(*- m and 
X - xk - h(t - k), which are used to simplify (15) and get 


I ~ f(x) dxKfof - (? l)h(, ~ 2 l h dl + f f fr(f- 0 )A(r- 2 ) 

^ -h)(-2h) * J 0 (h){-h) 


+ h L ~mm ~ hd ’ 


= / 4 l (' 2 -3< + 2 )dl~ fl h fit 1 -2t)dt + / 2 ^ /V-O. 

- 4 (I)-«(t)«K!) 

= 5(/o + 4/i+/ 2 ), 


Wl hie proof is complete. We postpone a sample proof of Corollary 7.1 until Sec- 


on „T CUOn m = X+e ~' sin < 4 ^ • he e 9 ua lly spaced quadni- 
Ute nottevo = 0.0 X, = 0.5. „ - 1.0, x 3 - 1.5, and x 4 = 2.0, and the co Jspond- 
Ing taction values /j = 1.00000,/, = 1.55152,/. = 0.72159, fi = 0.93765 and 
^-1.1339U. Apply the various quadrature formulas (4) through (7), 
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The step size is h ~ 0.5, and the computations are 

, f° 5 f{x) dx 5^(1.00000+ 1.55152) = 0.63788 
Jo 2 

f''° fix) dx* ^(1.00000 + 4(1.55152) + 0.72159)= 1.32128 
Jo 3 


f(x)dx ■ 


~ (1.00000 + 3(1.55152) + 3(0.72159) + 0.93765) 


= 1.64193 

r f{x)dx * ^^(7(1.00000) + 32(1.55152)+ 12(0.72159) 

Jo 45 

+ 32(0.93765)+ 7(1.13390)) = 2.29444. ■ 

It is important to realize that the quadrature formulas (4) through (7) applied in the 
above illustration give approximations for definite integrals over different intervals. 
The graph of the curve y - fix) and the areas under the Lagrange polynomials y = 
Py (j[), y = r 2 d), y = / jt.V), and y = P 4 00 are shown in Figure 7.2(a) through (d), 
respectively. 

In Example 7.1 we applied the quadrature rules with h — 0.5. If the end points 
of the interval [a, b] are held fixed, the step size must be adjusted for each rule. The 
step sizes are h = b — a, k = (b — a)/2, h = (b — n)/3, and h — {b — a)/ 4 for the 
trapezoidal rule, Simpson’s rule, Simpson’s | rule, and Boole’s rule, respectively. The 
next example illustrates this point. 

Example 7.2, Consider the integration of the function f{x) = 1 + e~ x sin(4x) over the 
fixed interval [a,b] = [ 0 , 1 ]. Apply the various formulas (4) through (7). 

For the trapezoidal rule, h = 1 and 


f(x)dx*^fm + /(i» 


= ^(1.00000 + 0.72159) = 0.86079. 


For Simpson’s rule, h = 1/2, and we get 

j 1 fix) dx * ^(/(0) +4/(2) + /(!)) 


= -(1.00000 + 4(1.55152) + 0.72159)= I.32I28. 
6 


For Simpson’s | rule, h — 1/3, and we obtain 


' f(x)dx a ?^(/(0) + 3/+ + 3/(§) + /(])) 


= 1(1.00000 + 3(1.69642) + 3(1.23447) +0.72159) = 1.31440. 
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0.0 0.2 0.4 0.6 0.8 1,0 0.U U.Z 0.4 0.6 0.8 1.0 


(a) (t>) 



0.0 0.2 0.4 0.6 0.S LQ 0.Q 0.2 0.4 0.6 Q.8 l.Q 


(<7 id) 

Figure 7 J (a) The trapezoidal rule used over [0, 1] yields the approximation 0.86079. 
(b) Simpson’s rule used over [0, 1 ] yields the approximation 1.32128. (c) Simpson's | 
rule used over { 0 , 1 ] yields the approximation 1,31440. (d) Boole's rule used over [0. 11 
yields the approximation 1.30859 . 


For Boole’s rule, h = 1/4, and the result is 

J q * ~^(7/(0) + 32/(1) + ] 2 /(}) + 32/(|) + 7/(])) 

=* ^(7(1.00000)+ 32(1.65534)+ 12(1.55152) 

yu 

+ 32(1.06666)+ 7(0.72159)) - 1.30859. 

The true value of the definite integral is 

f 1 fMdx = 21e - 4C °, S /- Sin(4) = 1.3082506046426.... 

Jo 17e 

and the approximation 1.30859 from Boole’s rule is best. The area undereach of the La 
grange polynomials Pi (x), P 2 (x), P 3 (x), and P 4 (x) is shown in Figure 7.3(a) through (d), 
respectively. * 

To make a fair comparison of quadrature methods, we must use the same number of 
function evaluations in each method. Our final example is concerned with comparing 
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integration over a fixed interval [ a , b] using exactly five function evaluations /* - 
f(xk), for fc = 0, 1, .... 4 for each method. When the trapezoidal rule is applied on 
the four subintervals [xq.xj], [x'i, x 2 J, [X 2 - A 3 ], and [a+ x 4 ], it is called a composite 
trapezoidal rule . 

/■X4 r*i r x i f x 4 

/ f(x)dx = / f(x)dx+ I f(x)dx+ f f(x)dx+ I f(x)d\ 
Jxq Jx 0 Jx } Jx 2 Jx 3 

( l7 > % ^(/o + /1) + ^(/i + /a) + 2^/7 + h) + 2^/3 + /*) 

= j(/0 + 2/, + 2/2+2/ 3 + /4). 

Simpson’s rule can also be used in this manner. When Simpson’s rule is applied on die 
two subintervals [x 0 . X 2 ] and [x 2 , x 4 ], it is called a composite Simpson’s rule: 


f*2 

I f( x )dx= I f(x)dx + / f(x)dx 
Jx 0 Jx 0 Jx 2 


< 18 > «j(/o + */l+/2) + |(/2+4/3 + /4) 

= J</0 + 4/l + 2/2 + 4/3 + fy ). 

The next example compares the values obtained with {17), (18), and (7). 

Example 7.3. Consider the integration of the function f(x) — 1 + e~ x sin(4 a ) over 
[a, b) = [0, 1]. Use exactly five function evaluations and compare the results from the 
composite trapezoidal rule, composite Simpson rule, and Boole’s rule. 

The uniform slep size is h = 1/4. The composite trapezoidal rule (17) produces 

f fWdx » !d (/(0 ) + 2 /(i) + 2 n{) + 2 /(|) + /(I)) 

= 1(1.00000 + 2(1.65534) + 2(1.55152) + 2(1.06666) +0.72159 
8 

= 1.28358. 

Using the composite Simpson’s rule (18), we get 

jf' fix) dx^ ^(/( 0 ) + 4/( J) + 2 f{\) + 4/(}) + /(D) 

^ ~(1.00000 + 4(1.65534) + 2(1.55152) + 4(1.06666) + C 


We have already seen the result of Boole’s rule in Example 7.2: 

. 2(1/4) /** J'/fW I -n.r/lN j i £■/ 1 \ I 'l 


f(x)dx 


(7/(0) + 32/(1) + 12/(4) + 32 /(}) + 7/(1)) 


= 1.30859. 
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Figure 7.4 (a) The composile trapezoidal rule yields the approximation 1.28358. 
i b) The composite Simpson rule yields the approximation 1.30938. 


Tin.' true value of the integral is 

*, w 2]c 4cos(4) — sin(4) 

/ f (*) dx = -^-- 1.3082506046426.... 

and the approximation 1.30938 from Simpson’s rule is much better than the value 1,28358 
obtained from the trapezoidal rule. Again, the approximation 1.30859 from Boole’s rule is 
closest. Graphs for the areas under the trapezoids and parabolas tire shown in Figure 7.4(a) 
and (b), respectively. 3 ^ 


Example 7.4. De termine the degree of precision of Simpson’s | rule. 

It will suffice to apply Simpson’s | rule over the interval [0, 3] with the five test func¬ 
tions f(x) = 1, x, x J , and j/. For the first four functions, Simpson’s | rule is exact, 


I Idx = 3 = -{1 + 3(1) + 3(1) + 1) 
0 o 


f 3 9 3 

Jo xdx= 2 = g (0 + 3 ( 1) + 3(2) + 3 ) 
f 3 3 

J x 2 dx ~ 9 = -(0 + 3(1) +3(4) +9) 

/*3 g| ^ 

Jo X3 dx = ~4 = g t0 + 3 <D + 3(*> + 27). 
the (Function f(x) == x 4 is the lowest power of * for which the rule is qot exact. 
f 3 4 . 243 99 3 

Jo * dx = -r^Y = 8 (0+m + 3(l6)+ * l) - 

Therefore, the degree of precision of Simpson’s | rule is n = 3 . 
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Exercises for Introduction to Quadrature 

1. Consider integration of /(a) over the fixed interval [a , b] = [0, 1]. Apply the vs us 

quadrature formulas (4) through (7). The step sizes aie h = 1, h = ^, h =* \ nd 

h — | for the trapezoidal rule, Simpson’s rule, Simpson’s | rule, and Boole’s e 
respectively. 

(a) f(x) = 3in(jru:) 

(b) fix) ~ 1 + e~ x cos(4a) 

(c) fix) = sin(V7) 

Remark. The true values of the definite integrals are (a) 2 in = 0.636619772367,,., 

(b) (1&? - cos(4) + 4sin(4))/(17e) = 1.007459631397..., and (c) 2(sin(l) - 
cos( 1)) = 0.602337357879 — Graphs of the functions are shown in Figures 1.5(a) 
through (c), respectively. 

2. Consider integration of f(x) over the fixed interval (a, b] = [0, 1]. Apply the various 
quadrature formulas: the composite trapezoidal rule (17), the composite Simpson's 
rule (18), and Boole’s rule (7). Use five function evaluations at equally spaced nod&s. 
The uniform step size is h == 

(a) f(x) - sin(jrjt) 

(b) f(x) = 1 + £^cos(4a) 

(c) fix) = sin(vx) 

3. Consider a general interval [a, b\. Show that Simpson’s rule produces exact results 
for the functions fix) — x 2 and f(x) = a 3 ; that is, 

(a) [„^dx = --- <b) S a x i dx = -~- 

4. Integrate the Lagrange interpolation polynomial 


, j: — Jq „ x — xq 

P\(x) = fo - ~ + A - i 

A<) - XI X\ - *0 


over the interval \xq, jq] and establish the trapezoidal rule. 


v y y 

i.o | \ 10 hv i.° | 



0.0 0.5 1.0 0.0 0.5 1.0 0,0 0.5 


(a) ib) (c) 

Figure 7.5 (a) y — sin(rrr), (b) y = 1 + e~ x cos ( 4 a -), (c) y =- sin(*/x). 
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f, Determine the degree of precision of the trapezoidal rule. It will suffice to apply the 
trapezoidal rule over [0, 1] with the three test functions fix) = 1, x, and x 2 . 

e. Determine the degree of precision of Simpson’s rule. It will suffice to apply Simp¬ 
son’s rule over [0, 2] with the five test functions fix ) = 1, x, a 2 , a 3 , and a 4 . Contrast 
your result with the degree of precision of Simpson’s g rule, 

7. Determine the degree of precision of Boole’s rule. It will suffice to apply Boole’s rule 
over [0, 4] with the seven test functions / (a) — 1, x, a 2 , a 3 , a 4 , a 5 , and a 6 . 

8. The intervals in Exercises 5, 6, and 7 and ELxample 7.4 were selected to simplify the 
calculation of the quadrature nodes. But, on any closed interval [a, b] over which 
the function f is integrable, each of the four quadrature rules (4) through (7) has the 
degree of precision determined in Exercises 5, 6, and 7 and Example 7.4, respectively. 
A quadrature formula on the interval [a, b\ can be obtained from a quadrature formula 
on the interval [c, d\ by making a change of variables with the linear function 


b — a ad — be 

x = git) - — ~t + . 

a — c a — c 


b - a 

where dx = -- dt. 


(a) Verify that a = git) is the line passing through the points (c, a) and ( d , b). 

(b) Verify that the trapezoidal rule has the same degree of precision on the interval 
[a, b ] as on the interval [0, 1]. 

(c) Verify that Simpson’s rule has the same degree of precision on the interval [a, b] 
as on the interval [0, 2]. 

(d) Verify that Boole’s rule has the same degree of precision on the interval [a, b] 
as on the interval [0,4]|. 

9 Derive Simpson’s | rule using Lagrange polynomial interpolation. Hint. After chang- 

*!•* variaKl? intssrak similar trs fhnss in flfii are obtained: 


J f>Xi dx * -ff- j\t - 1)0 - 2)0 - 3)dl + ft | j\t - 0)0 - 2)0 - 3) dt 

-fi\ j o - 0)0 - do - 3 )dt + h\ [\> -m - oo- 2 )dt 


=A I("r + 2 ' 3 - i r + 6 ')[_ 0 +/l I( 7 “T +3 ')[_ # 

]0. Derive the chased Newton-Cotes quadrature formula, based on a Lagrange approxi¬ 
mating polynomial of degree; 5, using the 6 equally spaced nodes x* = aq +kh, where 
k = 0, 1.5. 






354 Chap. 7 Numerical Integration 


11. In the proof of Theorem 7.1, Simpson’s rule was derived by integrating the second- 
degree Lagrange polynomial based on the three equally spaced nodes jq, and x 2 
Derive Simpson’s rule by integrating the second-degree Newton polynomial based or. 
the three equally spaced nodes * 0 . * 1 , and x% 


7.2 Composite Trapezoids! and Simpson’s Rule 

An intuitive method of finding the area under the curve y = f(x) over [a, b ] is 
by approximating that area with a series of trapezoids that lie above the intervals 

Theorem 7.2 (Composite IVapezoidal Rule). Suppose that the interval [a, b ] is 
subdivided into M submtervals[xjqxk+i5ofwidthfc - (b - a) jM by using the equally 
spaced nodes xk = a + kh , for k — 0, l, M. The composite trapezoidal rule for 

M subintervals can be expressed in anv of three eouivalent wavs: 




(lb) T(f, h ) — — (/o 4- 2 /t + 2/2 + 2/3 H-+ 2/a/_2 + 2 / m -\ + / m ) 


T{j% h) = -ifia) + fib)) + h 2_^ fixk). 


This is an approximation to the integral of f{x) over [a, b], and we write 


f(x)dx*T(f,h ). 


Proof Apply the trapezoidal rnte over each subinterval [xjt-i, (see Figure 7.6). 
Use the additive property of the integral for subintervals: 

(3) f f{x)dx = ^ f /<■*) r (/(-**- 1 J + /(**))• 

Ja k=] Jx i -1 k=\ 1 

Since hf 2 is a constant, the distributive law of addition can be applied to obtain (la). 
Formula (1 b) is the expanded version of (1 a). Formula (1 c) shows how to group all the 
intermediate terms in (lb) that are multiplied by 2. • 
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Approximating f(x) = 2 + &in(2*/x) with piecewise linear polynomials results in 
places where the approximation is close and places where it is not. To achieve accuracy 
the composite trapezoidal rule must be applied with many subintervals. In the next 
example*we have chosen to numerically integrate this function over the interval [1,6], 
Investigation of the integral over [0, 1 ] is left as an exercise. 

Example 7.5. Consider f(x) = 2 + sin(2Vx). Use the composite trapezoidal mle with 
11 sample points to compute an approximation to the integral of f(x) taken over [1,6]. 

To generate 11 sample points, we use M = 10 and h — (6 — 1)/10 = 1/2. Using 
formula (lc), the computation is 

r(/, i> = 'T (/(1 > +/<6) > 

+ Unb+m+ rtf)+ m +/(!>+ m +/(!>+/«>+/<?» 

L 

= 1(2.90929743+ 1.01735756) 

4 

+ 1(2.63815764 + 2.30807174+1.97931647+ 1.68305284 + 1.43530410 
2 

+ 1.24319750+ L10831775+ 1.02872220+ 1.00024140) 

= 1(3.92665499)+ 1(14.42438165) 

4 2 

= 0.98166375 + 7.21219083 = 8,19385457. u 

Theorem 7.3 (Composite Simpson Rule). Suppose that [a, b) is subdivided into , 
2Af subintervals [ xk , ■**+]] of equal width h = (b — a)/(2M ) by using xk = a+kh for 
k = 0, 1,,,., 2Af. The composite Simpson rule for 2M subintervals can be expressed 
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in any of three equivalent ways: 


M 


(4a) 


S(f k)= X + 4 /<*2*-l) + /(*»» 


Jt= I 


or 


(4b) 


5(/, ft) = |(/fa + 4/i + 2/2 + 4/ 3 

4-h 2f2M-2+4fiM-\ + flM) 


or 


y 



Figure 7.7 Approximating the area 
under the curve y = 2 + sin(2,/r) 
with the composite Simpson rale. 


(4c) 


S(f,h) = ^(/(a) + /<*» /ta)+Y^/ta-i) 


W-I 


4ft 


M 


k=t 


k=l 


This is an approximation to the integral of /(j:) over [a, b], and we write 


Example 7.6. Consider fix) = 2 + sin(2y/x). Use the composite Simpson rule with 11 
sample points to compute an approximation to the integral of fix) taken over [1,6]. 

To generate 11 sample points, we must use M = 5 and ft = (6 — 1 >/10 = 1/2. Using 
formula (4c), the computation is 


(5) 


r 


f(x)dx 


5(/,ft). 


Proof. Apply Simpson’s rule over each subinterval [xik-i, *2Jfc] (see Figure 7,7). Use 
the additive property of the integral for subintervals: 


( 6 ) 



M fa 

k = \ 4 


Since ft/3 is a constant, the distributive law of addition can be applied to ob¬ 
tain (4a). Formula (4b) is the expanded version of (4a). Formula (4c) groups all 
the intermediate terms in (4b) that are multiplied by 2 and those that are multiplied 
by 4. • 

Approximating f(x) = 2 + sin(2*/x) with piecewise quadratic polynomials pro¬ 
duces places where the approximation is close and places where it is not. To achieve 
accuracy the composite Simpson rule must be applied with several subintervals. In 
the next example we have chosen to numerically integrate this function over [1, 6) and 
leave investigation of the integral over [0, 1] as an exercise. 


S(/, h = b/(l) + /(6)) + b/(2) + /(3) + /(4) + /(5)) 

2 o j 

+ §(/<!) + /(f) + /<!> + /<!) + /<¥» 

= -(2.90929743 + 1.01735756) 

6 

+ 1(2.30807174+ 1.68305284+ 1.24319750+ 1.02872220) 

+ |(2.63815764+ 1,97931647+ 1.43530410+ 1.10831775+ 1.00024140) 

= -(3.92665499)+ \ (6.26304429) + ^(8.16133735) 

6 3 J 

= 0.65444250 + 2.08768143 + 5.44089157 = 8.18301550. B 


Error Analysis 


The significance of the next two results is to understand that the error terms £/(/, ft) 
and Esif, ft) for the composite trapezoidal rule and composite Simpson rule are of 
the order 0(h 2 ) and 0(h A ), respectively. This shows that the error for Simpson’s 
rule converges to zero faster than the error for the trapezoidal rule as the step size ft 
decreases to zero. In cases where the derivatives of fix) are known, the formulas 


£r(/,ft) = 


-(b-a)f<- 2 Hc)h 2 


and 


Esif, A) = 


-(b~a)f^(c)h 4 


12 


180 
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can be used to estimate the number of subintervals required to achieve a specified 
accuracy. 
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Now we are ready to add up the error terms for all of the intervals [**, **+ 1 ]: 


f f(x) dx = jr r f(x)dx 
Ja t= 1 At-1 


JL h &3 M 

= H 2 (/(Xi_l) + ~12 £ 


The first sum is the composite trapezoidal rule T(f, h). in the second term, one factor 
of A is replaced with its equivalent A = (b — a)jM, and the result is 




The term in parentheses can be recognized as an average of values for the second 
derivative and hence is replaced by /< 2 >(c). Therefore, we have established that 

f f(x)dx = T (/, h) - i^a )f (1 Hc)h 2 , 

Ja 12 

and the proof of Corollary 7.2 is complete. # 

Corollary 7.3 (Simpson’s Rule: Error Analysis). Suppose that [a, b ] is subdivided 
into 2M subintervals [xjt, **+]] of equal width A = (b ~ a)/(2Af). The composite 
Simpson rule 

04) S(f, h) = -(/<„> + /(*)) +^/M + tE /<*»-!> 

*=1 i t=l 

is an approximation to the integral 


fix) dx = S{f, h) -j- Esif, A), 


Furthennore, if / € C 4 [a, A], there exists a value c with a < c < b so that the error 
term Esif, h) has the form 

(16) Esif. h) = —~ fl 1 ) 8 ^ 4>(C) * < = 0(h 4 ). 


Example 7,7. Consider fix) — 2 + sin(2vT). Investigate the error when the compos¬ 
ite trapezoidal rule is used over [1,6] and the number of subintervals is 10, 20, 40, 80, 
and 160. 
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Table 7.2 The Composite Trapezoidal Rule for 
fix ) = 2 + sin( 2 v^) over [ 1 , 6 ] 


M h T(fh) E T (f,h)= 0(h 2 ) 

"To 05 8.19385457 -0.01037540 

20 0.25 8.18604926 -0.00257006 

40 0.125 8.18412019 -0.00064098 

SO 0.0625 8.18363936 -0.00016015 

160 0.03125 8.18351924 -0.00004003 


Table 7.2 shows the approximations T if, h). The antiderivative of fix) is 

F(x) = 2x - v7cos(2VF) + 

ana the true vaiue of the definite integral is 

f 6 ijf =6 

/ f(x)dx = F(jc) = 8.1834792077. 

Jl ^=i 

This value was used to compute the values Er(f, h) = 8.1834792077 — T (/". h) in Ta 
ble 7.2. It is important to observe that when h is reduced by a factor of j the successive 
errors ErifJ 0 are diminished by approximately \. This confirms that the order is O (h 2 ) 


Example 7.8. Consider fix) -2 + sin{2v^)- Investigate the error when the composite 
Simpson rule is used over [1, 6 ] and the number of subintervals is 10, 20,40, 80, and 160. 

Table 7.3 shows the approximations S(f, h). The true value of the integral is 
8 .1834792077, which was used to compute the values Es if h) = 8.1834792077 -S(f, h ) 
in Table 7.3, It is important to observe that when h is reduced by a factor of ^ the successive 
errors E$(f, h) are diminished by approximately -j^. This confirms that the order is Oih A ). 

m 


r.ximipie i.y. riiia uic numoer m ana me siep size n so uim. me ciiul c- j >, / , n / ivi ure 
composite trapezoidal rule is less than 5 x 10~ 9 for the approximation / 2 7 dx/x as T if, A). 

The integrand is fix ) = l/x and its first two derivatives are fix) = —l/x 2 and 
f a) {x) — 2/jc 3 . The maximum value of |/ ( 2 > U)I taken over [2,7] occurs at the end point 
x == 2, and thus we have the bound |/ ( 2 > (c)] < i/ ( 2 ) (2)| = 5 , for 2 < c < 7. This is used 
with formula (9) to obtain 


\E T (f.h)\ = 


\~(b-a)f 2 \c)h 2 \ 

12 


( 7-2 y 4 h 2 


(17) 
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Tfcble 7.3 The Composite Trapezoidal Rule for 
fix) = 2 + sin< 2 V*) over [ 1 , 6 ] 



8.18301549 

8.18344750 

8.18347717 

8.18347908 

8.18347920 


E s (fh) = Oih 4 ) 

000046371 

0.00003171 

0.QCO00204 

0.0CO00013 

0.00000001 


The step size h and number M satisfy the relation k = 5/M, and this is used in (17) to get 
the relation 

125 , m _9 

(18) iEr(/,(0l <^25*10 . 

Now rewrite (18) so that it is easier to solve for M: 

(19) I * «* S Ml - 

Solving (19), we find that 22821.77 < M. Since M must be an integer, we choose M = 
22 822 and the corresponding step size « ft = 5/22,822 = 0.000219086846. When the 
composite trapezoidal rule is implemented with this many function evaluations, there . 
possibility that the rounded-off function evaluations will produce a significant amount ol 
error. When the computation was performed, the result was 


' 22,822 


= 1.252762969, 


which compares favorably with the true value / 2 dx/x — ln(x )\ x= 2 1,252762968. Th 

error is smaller than predicted because the bound i for ]/< 2) (c) | was used. Experimentation 

shows that it takes about 10,001 function evaluations to achieve the desired accuracy ct 
5 x 10“ 9 , and when the calculation is performed with M = 10,000, the result is 


10,000 


= 1.252762973 


The composite trapezoidal rule usually requires a large number of function eval¬ 
uations to achieve an accurate answer. This is contrasted in the next example with 
Simpson’s rule, which will require significantly fewer evaluations. 
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Example 7.10. Find the number M and the step size h so that the error E s (f, h ) for th. 
composite Simpson rule is less than 5 x 10 -? for the approximation dx/x % S{f\ h). 

The integrand is f(x) = 1/r, and / <4) 00 = 24/jc 5 . The maximum value of 
taken over [2,7} occurs at the end point jc = 2, and thus we have the bound |/ f 4 ) (c)| < 
|/ ( 4 ) (2)| = | for 2 < c < 7. This is used with formula (16) to obtain 


SU n 180 ~ !80 48 


The step size h and number M satisfy the relation h — 5/{2M), and this is used in ( 20 ) to 
get the relation 




Now rewrite (2 V) so that it is easier to solve for M: 




Solving (22), we find that 112.95 < M . Since M must be an integer, we chose M - 113 
and the corresponding step size is h — 5/226 = 0.02212389381. When the composite 


Simnson rule was Derformed. the result was 


5 \^ 226 J = L252762969 ' 


which agrees with fa dx/x — ln(x)|*=^ = 1,252762968, Experimentation shows that it 
takes about 129 function evaluations to achieve the desired accuracy of 5 x 10 -9 , and when 
the calculation is performed with M = 64, the result is 


S 



1.252762973. 


So we see that the composite Simpson rule using 229 evaluations of / (jc) and 
the composite trapezoidal rule using 22,823 evaluations of f(x) achieve the same ac¬ 
curacy. in Example 7.10, Simpson’s rule required about the number of function 
evaluations. 
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Program 7.1 (Composite Trapezoidal Rule). To approximate the integral 

fb M-l 

/ f{x) dx ^ -if (a) -F f(b)) + / 1 ? fix *) 

Ja i 

by sampling /(jc) at the M + 1 equally spaced points x k = a + kh t for k = 0, 1, 2, 
*. -, M . Notice that xn = a and xm = b. 

function s-traprl(f 

'/.Input - f is the integrand input as a string *f > 

'/. - a and b are upper and lower limits of integration 

'/. - H is the number of subintervals 

XGutput - s is the trapezoidal rule sum 

h»(b-a)/M; 

s=*0; 

for k=i:(M-l) 
x=a+h*k; 
s=s+feval(f f x); 

end 

s=h* (f eval (f,a)+feval (:f ,b)) /2+h*s;; 


Program 7 J (Composite Simpson Rule). To approximate the integral 

ft L OJL Af, W 

I fix) dx » -(/(a) + fib)) + -- fix*) + — T f 

Ja J J 1=1 3 *=T 

by sampling fix) at the 2AJ r + 1 equally spaced points jc* = a + kh, for k — 0, 1, 
2, ..., 2 M. Notice that jcq == a and X 214 — b. 

function s^sitnprl(f .a.b.M) 

'/.Input - f is the integrand input as a string ’f 3 

I. - a and b are upper and lower limits of integration 

X - M is the number of subintervals 

'/. Output - s is the simpson rule sum 

h=(b-a)/(2*M); 

sl=0j 

s3-0; 

for k-l:M 

'x=a+h*(2*k~l); 
sl=sl+feval(f,x ); 

end 

for k=l:(M-;t) 
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x=a+h*2*k; 
s2=s2+feval(f,x); 
e nd . 

s=h*(feval(f»a)+feval(f, b)+4*sl+2*s2)/3; 


Exercises for Composite Trapezoi dal and Simpson's Role 

1. (i) Approximate each integral using the composite trapezoidal rule with A/ — 10. 
(ii) Approximate each integral using the composite Simpson rule with M = 5. 

(a) /_!,(! +x 2 )~ l dx (b) /q (2 + sin(2V* 7 )) dx (c) f^dx/JZ 

(d) / 0 4 x 2 e~ x dx (e) f 2 2.x cos (a) dx (f) f* sin(2x)e~ x dx 

2. Length of a curve , The arc length of the curve y = fix) over the interval a < x < / 





(i) Approximate the arc length of each function using the composite trapezoidal 
rule with M = 10. 

(ii) Approximate the arc length of each function using the composite Simpson rule 



with Af = 5. 



(a) 

fix) == X 3 

for 

0 < x < 1 

(b) 

fix) == sin(x) 

for 

0 < x < jr/4 

(c) 

fix) ~ e~ x 

for 

0 < x < 1 


3. Surface area. The solid of revolution obtained by rotating the region under the 
v = where a < x < b, about the x-fLxis has surface area given by 


area —2 n f f(x)J 



(i) Approximate the surface area using the composite trapezoidal rule with M — 

10 , 

(ii) Approximate the surface area using the composite Simpson rule with M = 5. 


(a) 

/(*) = 

X 3 

for 

VI 

H 

VI 

o 

(b) 

fix) = 

sin(x) 

for 

0 < x < rr/4 

(c) 

fix) = 

e~ x 

for 

0 <x < 1 
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4. (a) Verify that the trapezoidal rule (M - 1, h — 1) is exact for polynomials of 
degree < I of the form f(x) — c\x + cq over [0, 1J. 

(b) Use the integrand /(x) = c-yx 2 and verify that the error term for the trapezoidal 
rule (M — 1, h = 1) over the interval [0, 1] is 


£r(/,A) = 


—(b — a)/^(c)/i 2 


5. (a) Verify that Simpson’s rule (M — 1, h = l) is exact for polynomials of degree 
< 3 of the form fix) = cixf + cix 2 + c E x + oo over [0, 21. 

(b) Use the integrand fix) = c 4 x 4 and verify that the error term for Simpson’s ru le 
(M — 1, h — 1) over the interval [0,2] is 


Esif,h) = 


-(b - a)f 4} {c)h 4 


6 . Derive the trapezoidal rule (A/ = 1, h = 1) by using the method of undetermined 
coefficients. 

(a) Find the constants uio and tui so that g(t)dt = WQg{0) 4 - uug(l) is exact for 
the two functions g(r) = i and g(r) = i. 

(b) Use the relation /(xo 4- ht) = g(f) and the change of variable x = xo 4- ht and 
dx = hdt to translate the trapezoidal rule over [0, 1 ] to the interval [x 0 , xi ]. 

Hint for pan (a). You will get a linear system involving the two unknowns wq and w ;. 

7. Derive Simpson’s rule {M = 1, h = I) by using the method of undetermined coeffi¬ 
cients. 

(a) Find the constants u?o, wi, and W 2 so that g(t)dt — iuog(0) + iuig(l) 4- 
W 2 gi 2 ) is exact for the three functions git) = 1, g(r) — t, and g(t) = t 2 . 

(b) Use the relation /(xo + ht) = g (/) and the change of variable x = xo + kt and 
dx = hdt to translate the trapezoidal rule over [0, 2] to the interval [xo, *21. 

Hint for part (a). You will geL a linear system involving the three unknowns u.’o, u-’i, 
and W 2 - 

8 . Determine the number M and the interval width h so that the composite trapezoidal 
rule for M subintervals can be used to compute the given integral with an accuracy of 
5 x 10" 9 . 

fXf6 f 3 1 f 2 

(a) / cos(x)dx (b) / t- dx (c) / xe dx 

J-nl 6 J 2 3-X JO 

Hint for part (c). / (2) (x) = {x - 2)e~ x . 

9. Determine the number M and the interval width h so that the composite Simpson rule 
for 2 M subintervals can be used to compute the given integral with an accuracy of 
5 x 1CT 9 . 

rJ r/6 T 3 1 f 2 


COS (x)dx 


Hint for pan (c). / ,4) {x) = ix —4)e~ 


i x ‘" d ' 
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10. Consider the definite integral j^ 1 , cos(x) dx = 2sin{0.1) = 0.1996668333. The 
following table gives approximations using the composite trapezoidal rule. Calculate 
Et{J, h) = 0.199668 - T (/, h) and confirm that the order is 0(h 2 ). 


M 

h 

S(f.h) 

E T (f.h)^0(h 2 ) 

1 

0-2 

01990008 


2 

0.1 

0.1995004 


4 

0.05 

0.1996252 


8 

0.025 

0.1996564 


16 

0.0125 

0.1996642 



11. Consider the definite integral cos (x)dx = 2sin(0.75) = 1.36327752(1. The 

following table gives approximations using the composite Simpson rule. Calculate 
Es(f, h) = 1.3632775 - S(f, h) and confirm that the order is 0(h A ). 


M 

h 

S(f,k) 

Es(f, b) = 0 (h 4 ) 

1 

0.75 

1.3658444 


2 

0.375 

1.3634298 


4 

0.1875 

1.3632869 


8 

0,09375 

1.3632781 



12. Midpoint rule. The midpoint mle on [jcq, xi] is 

J f(x)dx = hf |x 0 + + ^/ ( 2 ) (c]), where h *= — - X ° . 

(a) Expand E(x), the antiderivative of /(x), in a Taylor series about x y 4- h/2 and 
establish the midpoint rule on [jco, xi]. 

(b) Use part (a) and show that the composite midpoint rule for approximating the 
integral of f(x) over [a, b\ is 

M(f, h) - h f ^ ~ ^ , where h - 

This is an approximation to the integral of /(x) over [ti, b], and we write 

f h f(x)dx^M(f,h). 

h.i 

(c) Show that the error term Em (/, h) for part (b) is 


£u(f, h) ~ 


h l 

24 


N 


E 


f i2) (Ck) - 


(b-a)f (2 Hc)h 2 

24 


0(h 2 ). 


13. Use the midpoint rule with M = 10 to approximate the integrals in Exercise 1. 

14. Prove Corollary 7.3 


Algorithms and Programs 


1. (a) For each integral in Exercise 1, compute M and the interval width h so that the 

composite trapezoidal rule can be used to compute file given integral with an 
accuracy of nine decimal places. Use Program 7.1 to approximate each integral, 
(b) For each integral in Exercise 1, compute M and the interval width h so that the 
composite Simpson’s rule can be used to compute the given integral with an 
accuracy of nine decimal places. Use Program 7.2 to approximate each integral. 

2. Use Program 7.2 to approximate the definite integrals in Exercise 2 with an accuracy 
of 11 decimal places. 


3. 


The composite trapezoidal rule can be adapted to integrate a function known only at 
a set of points. Adapt Program 7.1 to approximate the integral of a function over 
an interval [a,b] that passes through M given points. (Note. The nodes need not 
be equally spaced.) Use this program to approximate the integral of a function that 


passes through the points | (Vk 2 4- 1, 



4. The composite Simpson’s rule can be adapted to integrate a function known only at 
a set of points. Adapi Program 7.2 to approximate the integral of a function over 
an interval [a, b ] that passes through M given points. (Note. The nodes need not 
be equally spaced.) Use this program to approximate the integral of a function that 

passes through the points j (Vk 2 + 1, 


5. Modify Program 7.1 so that it uses the composite midpoint rule (Exercise 12) to 
approximate the integral of f(x ) over [a, b]. Use this program to approximate the 
definite integrals in Exercise 1 with an accuracy of 11 decimal places. 


6. Obtain approximations to each of the following definite integrals with an accuracy of 
ten decimal places. Use any of the programs from this section. 

(a) / sin(l fx)dx (b) / —pr—dx 

Jl/ln ^lO- 5 sin{l/x) 

7. The following example shows how Simpson’s rule can be used to approximate the 
solution of an integral equation. The equation v(x) = x 2 4- 0.1 /J (x 2 +1 )u(r) di is to 
be solved using Simpson’s rule with h = 1/2. Let r 0 = 0, ft = 1/2, and = 1; then 


L 


(x 2 +t)v(t)dt & ^((x 2 +0)v 0 + 4(x 2 + |)di + (x 2 + l)u 2 ). 


Let 

(1) u(x„) = x 2 +0.1(^((x 2 + 0)uo + 4(x 2 + ^)v] +(x 2 + l)t> 2 ))- 

Substituting xo = 0, X[ = 1/2, and x 2 = 1 into equation (1) yields the system of 
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linear equations: 

vo = 0 + — ((0)t'o + + V2 ) 

oU 

(2) V] = 7 + ~(-J-uo + 3ul+JU 2 ) 

4 60 4 4 

V2 = 1 + —(vo + 6vi +2 ij 2 ) 

Substituting the solution of system (2) (vq = 0,0273, v\ = 0.2866, i >2 = 1.0646) into 
equation (1) and simplifying yields the approximation 

(3) v(x) » 1 .037305* 2 + 0.027297. 

(a) As a check, substitute the solution into the right-hand side of the integral equa¬ 
tion, integrate and simplify the right-hand side, and compare the result with the 
approximation in (3). 

(b) Use the composite Simpson rule with h = 0.5 to approximate the solution of 
the integral equation 

u(x) =x 2 — 0.1 f (x 2 + t)v(t)dt. 

JO 

Use the procedure outlined in part (a) to check your solution. 


7.3 Recursive Rules and Romberg Integration 

In this section we show how to compute Simpson approximations with a special linear 
combination of trapezoidal rules. The approximation will have greater accuracy if one 
uses a larger number of subintervals. How many should we choose? The sequential 
process helps answer this question by trying two subintervals, four subintervals, and 
so on, until the desired accuracy is obtained. First, a sequence (7(7)1 of trapezoidal 
rule approximations must be generated. As the number of subintervals is doubled, the 
number of function values is roughly doubled, because the function must be evaluated 
at all the previous points and at the midpoints of the previous subintervals (see Fig¬ 
ure 7.8). Theorem 7.4 explains how to eliminate redundant function evaluations and 
additions. 

Theorem 7.4 (Successive Trapezoidal Rules). Suppose that 7 > 1 and the points 
{x* — a + kk) subdivide [a, b] into 2 J = 2 M subintervals of equal width h ss 
(£? - a)/2 J , The trapezoidal rules 7(/, h) and T (/, 2 h) obey the relationship 

T (/, h) =--- 1 - 

1 *=i 


( 1 ) 
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a b a b 


(c) (7) 

Figure 7.8 (a) T (0) is the area under 2° = I trapezoid, (b) 7(1) is the area under 
2 1 = 2 trapezoids, (c) 7(2) is the area under 2 2 = 4 trapezoids, (d) 7(3) is the area 
under 2 3 = 8 trapezoids. 


’Definition 7.3 (Sequence of Trapezoidal Rules). Define 7(0) == (h / 2 )(/(q) + 
/(£»)), which is the trapezoidal rule with step size h — b — a. Then for each J > 1 
define 7(7) — T{f, h), where T(f, k) is the trapezoidal rale with step size h — 
(b-a)/2 J . k 

Corollary 7.4 (Recursive Trapezoidal Rule). Start with 7(0) == ( 6 / 2 )(/(a) + 
(f(b)). Then a sequence of trapezoidal rules (7(7)} is generated by the recursive 
formula 

T( J _ 1 ) ^ 

(2.1 TV) = - V > . +h'£f(x 2 k-l) for J = 1, 2. 

where h = (b — a)/2 J and { x k - a + kk). 

Proof. For the even nodes xq < xi < • ■ ■ < X 2 M -2 < X 2 M, we use the trapezoidal 
rule with step size 2k: 

/ 2 h 

(3) 7(7 — 1) = —(fo + 2fl -I- 2/4 -I--b 2/2A4-4 + 2/2A /—2 + flM)- 


For all of the nodes xq < xi < x 2 < • - < X 2 M -1 < x^m, we use the trapezoidal rule 
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with step size A: 


{ 4 J 7 ( 7 ) — — (/o + 2 /i + 2/2 + - ■ ■ 4 - 2/2A/-2 4 - 2/2M-1 + fiM)- 

Collecting the evert and odd subscripts in (4) yields 

h M 

(5) T(J) = -(/o + 2/2 4-1- 2 / 2 M -2 + fm) + k y^/at-i- 


Substituting (3) into (5) results in T(J) = 7(7 — l}/2 4- h YlkL] ftk-u and the proof 


of the theorem is complete. 


Example 7.11. Use the sequential trapezoidal rule to compute the approximations 7 (() 
T{ 1), 7(2), and 7(3) for the integral dxfx = ln(5) -ln(l) = 1,609437912. 

Table 7.4 shows the nine values required to compute 7(3) and the midpoints required 
to compute 7(1), 7(2), and 7(3). Details for obtaining the results; are as follows: 

When h = 4: 7(0) - ^(1.000000 4- 0.200000) - 2.400000. 

When h = 2: 7(1) = ™ + 2(0.333333) 

= 1.200000 4- 0.666666 = 1 .S66666, 


When 6 = 1: 7(2) = 


1(0.500000 + 0.250000) 


= 0.933333 + 0.750000 = 1.683333. 

1 7(2) 1 

When h = - : 7 (3) = + - (0.666667 + 0.400000 

+ 0.285714 + 0.222222) 

= 0.841667 + 0.787302= 1.628968. ■ 

Our next result shows an important relationship between the trapezoidal rule and 
Simpson’s rule. When the trapezoidal rule is computed using step sizes 2 h and k, 
the result is 7(/, 26) and 7(/, h ), respectively. These values are combined to obtain 
Simpson’s rule: 


S(fih) = 


47(/, A) — 7(/, 2/0 


Theorem 7.5 (Recursive Simpson Rules). Suppose that {7(7)} is the sequence of 
trapezoidal rules generated by Corollary 7.4. If J > 1 and 5(7) is Simpson’s rule For 
2 J subintervals of [o, b], then 5(7) and the trapezoidal rules 7(7-1) and 7(7) 
the relationship 


5 ( 7 ) = 


47(7) - 7(7 - 1) 


for 7 = 1,2, 
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™! e r™ '“f ™ PotaS Used “ Com P ute r <3) and the Midpoint Routed to Compote 

I (1), I (2), and 7(3) T 

1 End points for Midpoints for Midpoints Midpoints for 

X *^ X '~x Com P utin g 7(0) computing 7(1) computing 7(2) computing 7(3) 

1.0 1,000000 iToooooo ■ ~--- 

1.5 0.666667 _„ 


0.400000 

0.333333 

0,285714 

0.250000 

0.222222 

0.200000 


0.200000 


0.333333 


0.250000 


0.400000 


0.285714 


0.222222 


Proof, The trapezoidal rule 7(7) with step size h yields the approximation 
f b h 

(8) L f(x) dX * 2 (A + 2 /> + 2/2 + " • + 2 fm -2 + 2 / 2 ,w-i + /,„) 

"= TV)- 

The trapezoidal rule 7(7 — 1) with step size 2/i produces 

(9) f fix) dx ^ A(/o + 2 /;> + -•■+ 2 / 2 m -2 + h m) = 7(7 - 1). 
Multiplying relation (8) by 4 yields 

CIO) 4 J a f (j ° dx ~ h (2fo + 4/1 + 4 /2 + - - ■ 4- 4 f 2M -2 + 4f 2Af -1 + 2 f M J 
= 47(7). 

Now subtract (9) from (10) and the result is 

On 3 la + +2/2H- h2f 2M -2 +4/2M-1 + fz\i) 

— 47(7) — 7(7 — 1), 

This can be rearranged! to obtain 

fb fa 

(12) X m ^ “ 3 (/ ° + 4/i+ 2/2 + -" + 2/2«-2 + \hu- l+hM ) 

_ 47(7)-7(7-1) 

3 

middle term in (12) is Simpson’s rule 5(7) = 5(/, h) and hence the theorer 
proved. 
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Example 7*12. Use the sequential Simpson ruie to compute tne approximations ouj. 
5(2), and 5(3) for the integral of Example 7.11. 

Using the results of Example 7.11 and formula (7) with J = 1,2, and 3, we compute 

5(1) = Wllim = 408 66666)-2>»00 0gg = L(jm ^ 
sa) = 4^(2) ~ 1(1) ^ 4(1.683333)-1.866666 = ] 

S(3) = ^0)-rg) , 4(1^ 28968)-^ 3333 = , 610846 


= 1.622222, 


= 1.610846. 


In Section 7.1 the formula for Boole’s rule was given in Theorem 7.1. It was 
obtained by integrating the Lagrange polynomial of degree 4 based on the nodes xp, 
x u jr 2l *3, and x 4 . An alternative method for establishing Boole’s rule is mentioned 
in the exercises. When it is applied M times over AM equally spaced subintervals OF 
[a , b] of step size h = (b - a)/{AM)* we call it the composite Boole rule. 


(13) B{f, k) = — £(7/*-4 + 32/4A-3 + 12/4i;_2 + 32/4^1 + 7/4*). 

*=l 

The next result gives the relationship between the sequential Boole and Simpson rales. 

Theorem 7.6 (Recursive Boole Rules). Suppose that (5(7)} is the sequence of 
Simpson’s rules generated by Theorem 7.5. If 7 > 2 and B(J) is Boole’s rule for 
2 J subintervals of [ a , 6], then B{J) and Simpson’s rules 5(7 — 1) and 5(7) obey the 
relationship 


165(7)-5(7- 1) 

BU) = —L-Jj- 


for 7 = 2, 3, 


Proof. The proof is left as an exercise for the reader. * 

Example 7.13. Use the sequential Boole rule to compute the approximations B{ 2) and 
5(3) for the integral of Example 7.11. 

Using the results of Example 7.12 and formula (14) with 7 = 2 and 3, we compute 

165(2) -S(l) 16(1,622222)- 1.688888 , 

5(2) =- — -=- - -= 1-61,778, 

TO) - 16 * 3 >- 5(2 > = 16(1-610846) — 1.622222 _ 

1 ' 15 15 

The reader may wonder what we are leading up to. We will now show that for 
mulas (7) and (14) are special cases of the process of Romberg integration. Let 115 
announce that the next level of approximation for the integral of Example 7,11 is 

645(3) - 5(2) = 64(1.610088)- 1.617778 = 

63 63 ' 

and this answer gives an accuracy of five decimal places. 
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Romberg Integration 

In Section 7,2 we saw that the error terms Er(f, b) and Es(f, h) for the composite 
trapezoidal rule and composite Simpson rule are of order 0(h 2 ) and 0{h 4 ) t respec¬ 
tively. It is not difficult to show that the error term £#(/, h) for the composite Boole 
rule is of the order 0(h 6 ), Thus we have the pattern 

(15) 

( 16 ) 

(17) 

The partem for the remainders in (15) through (17) is extended in the following 
sense. Suppose that an approximation rule is used with step sizes h and 2 h\ then an al¬ 
gebraic manipulation of the two answers is used to produce an improved answer. Each 
successive llevel of improvement increases the order of the error term from O ( h • jV ) 
to 0{h 2N+2 ). This process, called Romberg integration , has its strengths and weak- 
tiesses. 

The Newton-Cotes rules are seldom used past Boole’s rule. This is because the 
nine-point Newton-Cotes quadrature rule involves negative weights, and all the rules 
past the ten-point rule involve negative weights. This could introduce loss of signif¬ 
icance error due to round off. The Romberg method has the advantages that all the 
weights are positive and die equally spaced abscissas are easy to compute. 

A computational weakness of Romberg integration is that twice as many function 
evaluations are needed to decrease the error from 0(h 2N ) to 0(h 2 ^' +2 '). The use of the 
sequential rules will help keep the number of computations down. The development 
of Romberg integration relies on the theoretical assumption that, if f e C N [a , b\ 
for all /V, then the error term for the trapezoidal rule can be represented in a series 
involving only even powers of h ; that is, 

(18) I** f(x)dx= T(f*h) + E T {f,h) t 

Ja 

where 

(19) E T (f t h) = aih 2 + a 2 h 4 + a 3 h 6 + - - - . 

A derivation of formula (19) can be found in Reference [153]. 

Since only even powers of h can occur in (19), the Richardson improvement pro¬ 
cess is used successively first to eliminate aj, next to eliminate a 2 , then to eliminate a 3 , 
and so on. This process generates quadrature formulas whose enor terms have even 
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orders 0(h 4 ), 0(h 6 ), 0(h s ), and so on. We shall show that the first improvement is 
Simpson’s rule for 2 M intervals. Start with T(f, 2h) and T(f , A) and the equations 

f b 

(20) / f(x)dx = T(f, 2h) + ai4h 2 + a 2 l6h A + a 3 64 A 6 + ■■■ 

Ja 
and 

(21) / f(x)dx - T(f, k) + a\k 2 + a 2 k 4 + a 3 A 6 + - - - 

Multiply equation (21) by 4 and obtain 
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The general pattern for Romberg integration relies on Lemma 7.1. 

Lemma 7.1 (Richardson’s Improvement for Romberg Integration). Given two 
approximations R{2h, K - 1) and R (A, K - 1) for the quantity Q that satisfy 

1 -V Q = *(A t K-D + c\h 2K + c 2 h 2K + 2 + .. - 

. 1 n d 

Q = R(2h, K ~ \)+c x 4 K h 2K + c 2 4 K+l h 2K+2 + ... , 


( 22 ) 


4 J f(x) dx = 4T(f, k) + a\4h 2 + a 2 4h 4 + a 3 4h 6 + ■ 


Eliminate <21 by subtracting (20) from (22). The result is 


(23) 


>L 


3 j f(x) dx = 47V. ”) - T</, 2A) - a 2 mr - u 3 60A fi - ■ 


in improved approximation has the form 

l311) Q = <a w.x-g-w.i-v +0ih2K+2) 

The proof is straightforward and is left for the reader. 


Now divide equation (23) by 3 and rename the coefficients in the series: 

(24 , T /U)d , = i 

Ja 3 

As noted in (6), the first quantity on the right side of (24) is Simpson’s rule S(f h) 
This shows that £$(/, A) involves only even powers of A: 


(25) 


fb 

/ f(x) dx = S(f, h) + b,h 4 + b 2 h 6 + b 3 h‘ + • ■ ■ 

Ja 


To show that the second improvement is Boole’s rule, start with (25) ami write 
down the formula involving S(f , 2A): 


(26) 


f fix) dx = S(f, 2k) + b { 16 A 4 + A 2 64A 6 4- A 3 256A 8 + ■ ■ ■ . 
Ja 


When b\ is eliminated from (25) and (26), the result involves Boole’s rule: 


(27) 


f b f(x) dx = “£(/’ h) - Sif, 2h) A 3 48A 6 A 3 240A 8 

J a 


= B(f, h) — 


15 15 15 

M8A 6 A 3 240A 8 


Definition 7,4. Define the sequence {R(J, K) : J > K}f =0 of quadrature formulas 
for / (x) over [a, b] as follows 

R(J, 0) = T(J) for J > 0, is the sequential trapezoidal rule. 

(31) R(J> 1) = S(J) for J > I, is the sequential Simpson rule. 

R(J, 2) = B(J ) for J > 2, is the sequential Boole’s rule. A 

The starting rules, {R(J , 0)), are used to generate the first improvement, { R (J, 1)}, 
which in turn is used to generate the second improvement, {R(J, 2)}. We have already 
seen the patterns 


(32) 


R(J, 2) = - lRU - 1) ~^ ] ~ '• ]) . for j > 2, 


which are the rales in (24) and (27) stated using the notation in (31). The general rule 
for constructing improvements is 


RU* K) = 


4 K R{J t K-\)-R(J 
4 ^ — 1 


for J > K. 


15 


15 


(33) 
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Table 7.5 Romberg Integration Tableau 



R{J, 0) 

RU, I) 

RCA 2) 

R(J, 3) 

RUA) 


Trapezoidal 

Simpson's 

Boole's 

Third 

Fourth 

J 

rule 

rule 

rule 

improvement 

improvement 

0 

R(0,0)_ 





1 

g(in) _ 

^==R(1. U-_ 




2 

R(2,0).^_ 

^ /?( 2,1)^_ 

RC 2, 2)^__ 



3 

/f{3.ni 


RO, 2)^_ 

--R(3,3)_. 


4 

R(A, 0)_ 

-^ «(4.1) 

72= R (4. 2)- 

--' R(4,3)- 

122::= R(4,4) 


liable 7,6 Romberg Integration Tableau for Example 7.14 


R(J, 0) 

Trapezoidal 

rule 

0.785398163397 

1.726812656758 

1.960534166564 

2.018793948078 

2.033347341805 

2.036984-954990 


R(J , 1) 

Simpson's 

rule 


2.040617487878 

2.038441336499 

2.038213875249 

2.038198473047 

2.038197492719 


2.0382962.59740 

2.038198711166 

2.038197446234 

2.038197427363 


R(A 3) 
Third 

improvement 


2.038197162776 

2.038197426156 

2.038197427064 


j Ui CUlIipLiUUJUiJai puipvbCo, UK 

tion tableau given in Table 7.5, 


fhp vnliipp Rf J arp 


pH in Rnmhp 


Example 7.14. Use Romberg integration to find approximations for the definite integral 


fX/* _ ft ft*- 

/ {x 2 + x + 1) cosCu) dx = -2 + - + — = 2.038197427067.... 

Jo 2 4 

The computations are given in Table 7.6. In each column the numbers are converging 
to the value 2.038197427067 .... The values in the Simpson’s rule column converge fastei 
than the values in the trapezoidal rule column. For this example, convergence in column! 
to the right is faster than the adjacent column to the left. 

Convergence of the Romberg values in Table 7.6 is easier to see if we look at the erroi 
terms £(/, K ) = -2+7tf2+n 2 /4 — R(J, K). Suppose that the interval width is h = b—c 
and that the higher derivatives of /(jr) are of the same magnitude. The error in column K 
of the Romberg table diminishes by about a factor of 1 /2 2K+2 = 1 /4 K +1 as one progresse: 
down its rows. Tire errors E(J, 0) diminish by a factor of 1/4, the errors E(J, 1) diminisl 
by a factor of 1/16, and so on. This can be observed by inspecting the entries { E(J , ^f)} in 
Table 7.7. m 
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'Table 7.7 Romberg Error Tableau for Example 7.14 



Theorem 7.7 (Precision of Romberg Integration). Assume that / e c 2K+2 [a, b]. 
Then the truncation error term for the Romberg approximation is given in the formula 

[ b f{x)dx = R(J> K) -\-b K h 2K+2 f aK+1 Hcj.K ) 

(34) Ja 

= R(J t K)+ 0(h 2K+2 ), 

where h = (b - a)/2 J , bic is a constant that depends on K, and cjj: e [a,b}\ see 
Reference [153], page 126. 

Example 7.15„ Apply Theorem 7.7 and show that 

f 2 

j \0x 9 dx =5 1024 = R(4,4). 

The integrand is /(*) = 10.x 9 , and /< J0) (r) = 0. Thus the value K = 4 will make the 
error term identically zero. A numerical computation will produce R (4, 4) - 1024. « 

Program 7.3 (Recursive Trapezoidal Rule). To approximate 


f b h 2 f 

/ f(x) dx 4- /(**» 

Ja 1 


by using the trapezoidal rule and successively increasing the number cif subintervals 
of [a , b]. The Jth iteration samples f(x) at 2 J + 1 equally spaced points. 

function T= : rctrap(f ,a,b,n) 

j ,Input - f is the integrand input as a string >f J 







% - a and b are upper and lover limits of integration 

“vi - n is the number of times for recursion 

‘/.Output - T is the recursive trapezoidal rule list 

M-l; 

h=b-a; 

T=zeros(l,n+l); 

1(1)=h*(faval(f,a)+feval(f,b))/2; 

for j=l:n 
M=2*M; 
h=h/2; 
s=0 \ 

for k=l:M/2 

x=a+h*(2*k-l); 
s=s+feval(f,x); 

end 

T(j+l)»T(j)/2+h*s; 


Program 7.4 (Romberg Integration). To approximate the integral 

f b f(x) dx* R(J, J) 

Ja 

by generating a table of approximations R(J,K) for J > K and using 
R{J 4- 1, J + 1) as the final answer. The approximations R(J, K) are stored in 
a special lower-triangular matrix. The elements R(J, 0) of column 0 are computed 
using the sequential trapezoidal rule based on 2 J subintervals of [«,&];, then R(J y K) 


The elements of row J are 


R(J,K) = R{J,K - 1 ) 


R(J, K - l) - R(J ~ UK-1) 
4* —1 


for 1 < K < J. The program is terminated in the (J + l)st row when 
J) ~ R(J + \,J + 1)| < tol. 

function [R,quad,err,,hj -rcmber(f, a,b,n, tol) 

V,Input - f is the integrand input as a string ’f* 

% - a and b are upper and lover limits of integration 

•/, - n is the maximum number of rows in the table 

*/. - tol is the tolerance 

'/.Output - R is the Romberg table 

*/. - quad is the quadrature value 

'/. - err is the error estimate 
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it ~ h is the smallest step size used 

M=l; 

h=b-a; 

err=l; 

J=0; 

R-zeros(4,4); 

R(1,1)=h*ffeval(f,a)+feval(f,b))/2; 
while((err>tol)&(J<n))|(J<4) 

J-J+l; 
h=h/2; 
s=0; 

for p=l:M 

x=a+h*(2*p-l); 
s=s+feval(f,x); 

end 

R(J+l,l)=R(J,l)/2+h*s; 

for K-l;J 

R(J+:L,K+l)=R(J+l t K) + {R(J+l J K)-R(J,K))/(4"K-1) ; 

end 

err=abs(R(J,J)-R(J+1 t K+l)); 

end 

quad=R(J+lj J+l) ; 

Exercises^fur^Reciirs^ e Rules and Romberg integration 

S r L Cach ° f the folIowin S defi nite integrals, construct (by hand) a Romberg table 
(Table IS) with three rows. 

, , f 3 sin(2r) 

(a) / 1 _L. 2 dx *= 0.6717578646 . .. 

Jo 1+x 2 

(b) j sin(4 x)e" 2x dx = 0.1997146621.. 

Jo 

fl l 

(c) j — dx — 1.6 
Jo.04 V* 

f 2 1 

(d) / “5-r dx = 4A1 1 3993943... 

J 0 r + TO 

(e) f sin ( - } dx — 1.1140744942... 

(f) j( \/4 ~x 2 dx = ;r = 3,1415926535 ... 
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2. Assume that the sequential trapezoidal rule converges to L (i.e., lim/-*oo 7 U) = 7). 

(a) Show that the sequential Simpson rule converges to L (i.e., limy~>co S(J) =■ L). 

(b) Show that the sequential Boole rule converges to L (i.e., limy_►«, B(J) = L). 

3. (a) Verify that Boole’s rule (Af = 1, h = 1) is exact for polynomials of degree < 5 

of the form f(x) = c$x 5 + C 4 X 4 + -h ci* 4~ co over [0,4]. 

(b) Use the integrand f(x) = c^x 6 and verify that the error term for Boole's rule 
(M — 1 , h = 1 ) over the interval [ 0 ,4] is 




- 2 (b - a)f i 6 \c)h 6 

945 


4. Derive Boole’s rule (M = 1, h = 1) by using the method of undetermined coeffi¬ 
cients: Find the constants wq, tui, uq, and um so that 


j g(t ) dt = wog(O) 4- uqg(l) -I- u>2g(2) + mgO) 4- ung(4) 


is exact for the five functions g(t) — 1, r, t 2 , r 3 , and / 4 . Hint. You will get the ear 

system: 


u»o + ini + u >2 + m + m = 4 

wi + 2 w 2 + 3u> 3 + 4iU4 = 8 

64 

uii + 4i02 + 9iU3 + I 6 UJ 4 = — 

uq + 8102 + 27 tU 3 4- 64 u >4 = 64 

1024 

wi 4- I6wz + 8 IW 3 + 256il4 — ■■ ■ 

5. Establish the relation B(J) = (165(7) - S(J - I))/16 for the case J = 2 Use the 
following information: 

5(1) = y(/o +4/2+ / 4 ) 

and 


6 . Simpson’s | rule. Consider the trapezoidal rules over the closed interval [in, X s ] 
T(f, 3 h) - (3h/2)(/o + fi) with step size 3k, and 7{/ t k) = (A/2)(/o + Ifi + 
2f 2 4- / 3 ) with step size h . Show that the linear combination (97 (/, h) — 7 (f, 3ti))f& 
produces Simpson’s-| rule. 

7. Use equations (25) and (26) to establish equation (27). 

8 . Use equations (28) and (29) to establish equation (30). 
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«F.r. 7.3 Recursive Rules and Romberg Integration 


9. Determine the smallest integer K for which 

(a) f 2 Zx 7 8 dx = 2$6=R(K,K). 

(b) Jq l ix^dx = 2048 = R(K, K). 

10. Romberg integration was used to approximate the integrals (i) / 0 l Jxdx and (ii) 
f* 2 f 2 dt, and the results are given in the following table: 

Approximations for (i) Approximations for (ii) 


R( 1, 1) = 0.6380712 
R(2, 2) = 0.6577566 
RQ. 3) = 0.6636076 
R( 4,4) =0.6655929 


/?(1, 1) =0.6666667 
R( 2, 2) =0.6666667 
RO, 3) =0.6666667 
R (4,4) =0.6666667 


(a) Use the change of variable x = t 2 and dx = 2 t dt and show that the two 
integrals have the same numerical value. 

(b) Discuss why convergence of the Romberg sequence is slower for integral ( 1 ) and 
faster for integral (ii). 

1.1, Romberg integration based on the midpoint rule. The composite midpoint rule is 
competitive with the composite trapezoidal rule with respect to efficiency and the 
speed of convergence. Use the following facts about the midpoint rule: f a f(x)dx = 
M(f, h) 4 - E M (f, h ). The rule M{f, h) and the error term E M {f, h ) are given by 

where h = b —^~. 


jEm(A =aih 2 + aih 4 + + ■ 


(a) Start with 


b — a {a + b 

M(°) = —f {— 


M(J) = M(f, hj) = hj 23 / (o + " 5 ) hi} • 

b — a 

where h j = 

(b) Show how the sequential midpoint rule can be used in place of the sequential 
trapezoidal rule in Romberg integration. 
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Algorithms and Programs 


1. Use Program 7.4 to approximate the definite integrals in Exercise 1 with an accuracy 
of 11 decimal places. 

2. Use Program 7.4 to approximate the following two definite integrals with an accuracy 
of 10 decimal places. The exact value of each definite integral is n. Explain any 
apparent differences in the rates of convergence of the two Romberg sequences. 

(a) f V4x -x 2 dx (b) / — 4 , dx 

JO Jo 1+X* 

3. The normal probability density function is f(t) — (l/v^^jrV ^ 2 , and the cumu a- 

five distribution is a function defined by the integral j e~ t2/2 ct. 

Compute values for <t>(0.5), <D(1.0), <I>(1.5), <J>(2.0), <I>(2.5), <t>(3.0), 4>(3.5), and 
<J>(4.0) that have eight digits of accuracy. 

4. Modify Program 7.3 so that it will stop when consecutive values T{K - 1 ) and T(b ) 
for the sequential trapezoidal rule differ by less than 5 x 10” 6 . 

5. Modify Program 7.3 so that it will also compute values for the sequential Simpsuo 
and Boole rules 


6 . 


7. 


Modify Program 7.4 so that it uses the sequential midpoint rule to perform Rombeig 
integration (use the results of Exercise 11). Use your program to approximate ti e 
following integrals with an accuracy of 10 decimal places. 



sin(A) 
- dx 

JC 



In Program 7.4 the approximations to a given definite integral are stored on the mam 


diagonal of a lower-triangular matrix. Modify Program 7.4 so that the rows of the 
Romberg integration tableau are sequentially computed and stored in a n x 1 matrix R, 


hence it saves space. Test your program on the integrals in Exercise 1, 


74 Adaptive Quadrature 

The composite quadrature rules necessitate the use of equally spaced points. Typically, 
a small step size h was used uniformly across the entire interval of integration to ensure 
the overall accuracy. This does not take into account that some portions of the curve 
may have large functional variations that require more attention than other portions of 
the curve. It is useful to introduce a method that adjusts the step size to be smaller 
over portions of the curve where a larger functional variation occurs. This technique is 
called adaptive quadrature. The method is based on Simpson’s rule. 

Simpson’s rule uses two subintervals over [ak, 2?*]: 

Slat, bt) = |(/te) + 4/<c,) + /«*)), 


( 1 ) 


where ck ~ ^(a k + b k ) is the center of [a*, b k ] and h ~ (b k - a k )f 2. Furthermore, if 
/ 6 C 4 [a k , b k ], then there exists a valuer^ e la*, b k ] so that 

(2) I" 8 * fix)ix = S(a t . 

Jai yu 


Refinement 

A composite Simpson rule using four subintervals of [a k , b k ] can be performed by 
bisecting this interval into two equal subintervals [a k i, b k \] and [a k2 , b k2 j and applying 
formula (1) recursively over each piece. Only two additional evaluations of /{*) are 
needed, and the result is 

S(a kl , b kl ) + S(a k 2 ,b k2 ) = ^(f(a k i) + 4f(c ki ) + f(b ki )) 

(3) h 

+ ~{f{a k2 ) +4f{c kl ) + f(b k2 )), 

where a k \ — a k , b k 1 = a k2 — c k , b k2 = b kr c kl is the midpoint of [a k i, b k 1 ], and c k 2 is 
the midpoint of [a k2 , hzl In formula (3) the step size is h/2, which accounts for the 
factors h/6 on the right side of the equation. Furthermore, if / e C 4 [a, b], there exists 
a value d 2 £ [a k , b k ] so that 

(4) f b> f(x) dx = SlatuhO + b k2 ) - 

Ja% Id 90 

Assume that f <4) (dy) « then the right sides of equations (2) and (4) are 

used to obtain the relation 

{5) Ste.fi,)* Slatubt^ + Slan.bn)- 

yu 1 o 90 

which can be written as 


.j/ (4) M2> ^ 16 


—(S(<3jti, b k i) + S(a k 2 , bkl) - S(a k , bk)). 


Then (6) is substituted in (4) to obtain the error estimate: 


f(x) dx - S(a k 1 , b k i) ~ S(a k2) b k i)\ 


* “[j l$(ajti, bfci) 4- S{aki, bki) - S{dk. b*)l. 

Because of the assumption / (4) (^i) the fraction is replaced with ^ on 

Lhe right side of (7) when implementing the method. This justifies the following test. 
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Accuracy Test 

Assume that the tolerance e k > 0 is specified for the interval [a k , b k }. If 

^ bki) 4- S{a k2 , b k2 ) — S{a k , b k )\ < t*, 

we infer that 

I C bk I 

j / fix) dx - S(a k[ , b ki ) - S(a k2 , b k2 j < e*. 

1 Ja^ | 

Thus the composite Simpson rule (3) is used to approximate the integral 

{•bk 

< i0 ) / f(x)dx*S(a kU bk\) + S(a k2 ,b k2 ), 

Jilk 

and the error bound for this approximation over [a k . b k ] is e k . 

Adaptive quadrature is implemented by applying Simpson’s rules (1) and (3). Start 
with {[mo, &oj, <m}> where to is the tolerance for numerical quadrature over [a 0 , bo\. 
The interval is refined into subintervals labeled [a m , 6 0 iJ and [a 0 2 , 602 ]. Tf the accu¬ 
racy test ( 8 ) is passed, quadrature formula (3) is applied to [a 0 , bo] and we are done. If 
the test in ( 8 ) fails, the two subintervals are relabeled [a \, b\ ] and [ a 2 , b 2 ], over which 
we use the tolerances €j = and € 2 — |eo, respectively. Thus we have two in¬ 
tervals with their associated tolerances to consider for further refinement and testing: 
{[d/[, b 1 ], £(} and {[a 2 > b 2 ], e 2 }, where <fj + c 2 — eo- If adaptive quadrature must be 
continued, the smaller intervals must be refined and tested, each with its own associated 
tolerance. 

In the second step we first consider {[a , b j ], e ;} and refine the interval [«], b \] into 
[«i 1 , b\\] and [a 12 , b\ 2 ]. If they pass the accuracy test (8) with the tolerance c 1 , quadra¬ 
ture formula (3) is applied to [ai, b \] and accuracy has been achieved over this interval. 
If they fail the test in (8) with the tolerance a , each subinterval [a n ,b n ] and [a ]2 , b i2 ] 
must be refined and tested in the third step with the reduced tolerance . Moreover, 
the second step involves looking at {[< 32 , b 2 ], £ 2 } and refining [ t a 2 , b 2 ] into [a 2 \ ,b 2 \] 
and T« 22 , b 22 \. If they pass the accuracy test (8) with tolerance e 2 , quadrature formula 
(3) is applied to [a 2 , b 2 ] and accuracy is achieved over this interval. If they fail the test 
in voj with tne tolerance c 2 , each subinterval [a 2 i. b 2 \ \ and [a 22 , ^ 22 ! must be refined 
and tested in the third step with the reduced tolerance j e 2 , Therefore, the second step 
produces either three or four intervals, which we relabel consecutively. The three inter¬ 
vals would be relabeled to produce {{fo, £>,], ej}, {[a 2 , b 2 ], € 2 ), b 2 ], € 3 }} t where 

fi -b e 2 4- £3 = <?o- In the case of four intervals, we would obtain [{[ai, b]], e|], 
{[a 2 , b 2 ], € 2 ], {[£ 13 , £* 3 ], £ 3 }, {[( 34 ., 6 4 ], f 4 }}, where €\ + e 2 + * : 3 4 - <4 — to- 

If adaptive quadrature must be continued, the smaller intervals must be tested, 
each with its own associated tolerance. The error term in (4) shows that each time a 
refinement is made over a smaller subinterval there is a reduction of error by about 
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Tfcble 7.8 Adaptive Quadrature Computations for fix) = 13 (x - x 2 )e' 3x/2 


fljfc 

bk 

1 , b k |) + S(a k 2 , b k 2 ) 

Error bound on 
the left side of (8) 

Tolerance < k 
for la k . b k ] 

0.0 

0.0625 

0.02287184840 

0.00000001522 

0.00000015625 

0.0625 

0.125 

0.05948686456 

0.00000001316 

0.00000015625 

0.125 

0.1875 

0.08434213630 

0,0000000]137 

0.00000015625 

0.1875 

0.25 

0.09969871532 

0.00000000981 

0.00000015625 

0,25 

0.375 

0.21672136781 

0,00000025055 

0.0000003125 

0.375 

0.5 

0.20646391592 

0.00000018402 

0.0000003125 

0.5 

0.625 

0.17150617231 

0.00000013381 

0.0000003125 

0.625 

0.75 

0.12433363793 

0.00000009611 

0.0000003125 

0.75 

0.875 

0.07324515141 

0.00000006799 

0.0000003125 

0.875 

1.0 

0.02352883215 

0.00000004718 

0.0000003125 

1.0 

1.125 

-0.02166038952 

0.00000003192 

0.0000003125 

1.125 

1.25 

-0.06065079384 

0.00000002084 

0.0000003125 

1.25 

1.5 

-0.21080823822 

0.00000031714 

0.000000625 

1.5 

2,0 

-0.60550965007 

0.00000003195 

0.00000125 

2.0 

2.25 

-0.31985720175 

0.00000008106 

0.000000625 

2.25 

2.5 

-0.3006174922S 

0.00000008301 

0.000000625 

2.5 

2,75 

-0.27009962412 

0.00000007071 

0.000000625 

2.75 

3.0 

-0.23474721177 

0.00000005447 

0.000000625 

3.0 

3.5 

-0.36389799695 

0.00000103699 

0.00000125 

3.5 

4.0 

-0.24313827772 

0.00000041708 

0.00000125 

Totals 

-1.54878823413 

0.00000296809 

0.00001 


a factor of Thus the process will terminate after a finite number of steps. The 
bookkeeping for implementing the method includes a sentinel variable which indicates 
if a particular subinterval has passed its accuracy test. To avoid unnecessary additional 
evaluations of fix), the function values can be included in a data list corresponding to 
each subinterval. The details Eire shown in Program 7.6. 

Example 7,16. Use adaptive quadrature to numerically approximate the vaiue of the 
definite integral I3(x - x 2 )e~ 3xl2 dx with the starting tolerance e 0 = 0.00001. 

Implementation of the method revealed that 20 subintervals are needed. Table 7.8 lists 
each interval [a k . b k ], composite Simpson rule , b k \ )+-S(<z*2> h 2 ), the error bound For 
this approximation, and the associated tolerance e*. The approximate value of the integral 
is obtained by summing the Simpson rule approximations to get 

(11) f I3(x — x 2 )e~ 3x ^ 2 dx % —1.54878823413. 

Jo 









which is smaller than the specified tolerance to = 0.00001. The adaptive method involves 
20 subintervals of [0,4], and 81 function evaluations weie used. Figure 7.9 shows the gra 
of y = f(x) and these 20 subintervals. The intervals are smaller where a larger functia ne\L 
variation occurs near the origin. 

In the refinement and testing process in the adaptive method, the first four intervals 
were bisected into eight subintervals of width 0.03125. If this uniform spacing is contin¬ 
ued throughout the interval [0.4], M — 128 subintervals are required for the compose 
Simpson rule, which yields the approximation —1.54878844029, which is in error by the 
amount 0.00000006776. Although the composite Simpson method contains half the error 
of the adaptive quadrature method, 176 more function evaluations are required. This gain 
of accuracy is negligible; hence there is a considerable saving of computing effort with the 
adaptive method. a 


Program 7.5, srule, is a modification of Simpson’s rule from Section 7.1. TV 
output is a vector Z that contains the results of Simpson’s rule on the interval [uO, fail. 
Program 7.6 calls srule as a subroutine to carry out Simpson’s rule on each of the 
subintervals generated by the adaptive quadrature process. 
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Program 7*5 (Simpson’s Rule). To approximate the integral 

f M f(x) dx « j(f(a 0) + 4/(c0) + f(b0)) 

JaO 3 

by using Simpson’s rale, where cO — (a0 + b0)/2. 
function Z=srule(f.aO.bOjtolO) 

jtlnput - f is the integrand input as a string , f* 

£ - a.0 and bO are upper and lower limits of integration 

% - tolO is the tolerance 

% Output - Z is a 1x6 vector [aO bO S S2 err toll] 
b=(b0-a0)/2; 

C=zeros(l,3) ; 

C=feval(f,[a0 (a0+b0)/2 bO] ); 

S=h*(C(1)+4*C(2)+C(3))/3; 

52=S; 

ioll=tol0; 
err=tol0 r 

Z= CaO bO S S2 err toll]; 

Program 7,6 produces a matrix SRmat, quad (adaptive quadrature approximation 
to definite integral) and err (the error bound for the approximation). The rows of 
SRmat consist of the end points, the Simpson’s rule approximation, and the error bound 
on each subinterval generated by the adaptive quadrature process. 

program 7.6 (Adaptive Quadrature Using Simpson’s Rule). To approximate 
the integral 

rb M 

I f(x)dx ^ y;(/(r 4A-4) +4/(r4jt-3) + 2f(xu-2 ) 

; k^\ 

1 + 4/(*4Jfc-|) + /(.*4ft))- 

1 the composite Simpson rule is applied to the 4 M subintervals [X 4 *_ 4 , X 44 ], where 
to, b] = [xo, xun] and j: 4 *- 4 +y = t 4 *-4 + jht, for each k = 1 ,..., M and j = 1 , 


fupotion [SRmat,quad,err]=adapt(f,a,b,tol) 

%friput - f is the integrand input as a string ’f* 

% - a and b are upper and lower limits of integration 

- tol is the tolerance 
% Output - SRmat is the table of values 
% - quad is the quadrature value 

% - err is the error estimate 
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’/.Initialize values 
SRmat = zeros(30,6); 
iterating=0; 
done=i; 

SRvec=zeros(1,6); 

SRvec=srule(f,a,b,tol); 

SRmat(1.1:6)=SRve c: 
m*l; 

state=iterating; 
while(state==iterating) 
n=m; 

for j=n:-1:1 

p=j; 

SROvec=SRmat(p,:); 
err=SRQvec(5); 
tol=SR0vec(6); 
if (tol<=err) 

/Bisect interval,apply Simpson's rule 
"/.recursively, and determine error 
state=done; 

SRIvec=SROvec; 

SR2vec=SROve c; 
a=SR0vec(l); 
b=SROvec(2); 
c=(a+b)/2; 
err=SR0vec(5); 


to!2~tol/2; 

SRlvec=srule(f,a,c,tol2); 

SR2vec=srule(f,c,b,tol2); 
err=abs(SROvec(3)-SRlvec(3)-SR2vec(3)) / 
/Accuracy test 
if (errCtol) 

SRmat(p,:)=SR0vec; 

SRmat (p,4)=SFLlvec(3)+SR2vec(3); 

SRmat (p,5)=err; 
else 

SRmat (p+1:m+l, : )=SRmat(p:m, :); 
m=m+l; 

SRmat(p,:)=SRlvec; 

SRmat (p+1,:) = : SR2vec; 
st at e=it erating; 


Sec. 7.5 GalsS'Legendre Integration (Optional \ 


389 

end 

end 

end 

quad=sum(SRmat(:,4)); 
err=sum(abs(SRmat(:,5))); 

SRmat=SRmat(1:m,1:6); 


Algorithms and Programs 

L Use Program 7.6 to approximate the value of the definite integral. Use the starting 
tolerance cq — 0.0000 L 

(a) [ dx (b) f sin(4 x)e~ lx dx (c) f ~dx 

Jo 1 + X 5 Jo J 0,04 VJC 

(d) [ • 1 | dx (e) f sin f^— ^ dx (f) f jAx-x 2 dx 

J0 JC* + yg JI/(2r) \ x / JO 

2. For each of the definite integrals in Problem 1 construct a graph analogous to Fig¬ 
ure 7.9. Hint. The first column of SRmat contains the end points (except for b) 
of the subintervals from the adaptive quadrature process. If T=SRmat (:, 1) and 
2-zeros (length (T)) ’, then plot (T, 2,'. ’) will produce the subintervals (ex¬ 
cept for the right end point b). 

3. Modify Program 7.6 so that Boole’s rule is used in each subinterval [a*, bk ]. 

4. Use the modified program in Problem .3 to compute approximations and construct 
graphs analogous to Figure 7.9 for the definite integrals in Problem I. 


7,<j Gauss-Legendre Integration (Optional) 

We wish to find the area under the curve 

y = fix), -1<*<1. 

What method gives the best answer if only two function evaluations art; to be made? 
We have already seen that the trapezoidal rule is a method for finding the area under 
the curve and that it uses two function evaluations at the end points {—1, /(—1)), and 
(1, /(l)). But if the graph of y = /(x) is concave down, the error in approximation 
is the entire region that lies between the curve and the lint: segment join ing the points 
(see Figure 7.10(a)). 

If we can use nodes x\ and xj that lie inside the interval [—1, 1], the line through 
die two points (xi, /(xi)) and (x 2 , fix 2 )) crosses the curve, and the area under the line 
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Theorem 7.8 (Gauss-Legendre Two-Point Rule). If / is continuous on [— 1, I ]. 
then 


(13) 




The Gauss-Legendre rule G 2 {f) has degree of precision n — 3. If f e C 4 [-1, I ]. 
then 


(14) 


/V)^ = /(-±) + /(-L) 


+ E 2 {f), 


where 

(15) 


E 2 (f) = 


f w (c) 

135 


Example 7.17. Use the two-point Gauss-Legendre rule to approximate 
/' = ln(3) - in(l) = 1.09861 

J l x + 2 

and compare the result with the trapezoidal rule T(f, h) with h = 2 and Simpson’s rule 
S(f, h) with h = l. 

Let Giif) denote the two-point Gauss-Legendre rule; then 

G 2 (/> = /(—0.57735) + /(0.57735) 

= 0.70291 + 0.38800 = 1.09091, 


Table 7.9 Gauss-Legendre Abscissas and Weights 


f f{x)dx = Vuw,* /(* N. t) + Efit(f) 
- 1 t=l 


N 

Abscissas, x N j 

Weights, aw,* 

Truncation error, 

E N (f) 

2 

-0.5773502692 

0.5773502692 

1.0000000000 

1.0000000000 

135 

3 

±0.7745966692 

0.0000000000 

0,5555555556 

0.8888888888 

f <6> tc) 

15,750 

4 

±0.8611363116 

±0.3399810436 

0,3478548451 

0.6521451549 

f'*>(€) 
3.472.875 

5 

±0.9061798459 

±0.5384693101 

0.0000000000 

0.2369268851 

0.4786286705 

0.5688888888 

1,237,732,650 

6 

±0.9324695142 

±0.6612093865 

±0.2386191861 

0.1713244924 ' 

0.3607615730 

0.4679139346 

+ !: +c)2 13 <6r) 4 

(120 3 13! 

7 ; 

±0.9491079123 , 

±0.7415311856 
±0,4058451514 
0,0000000000 

0.1294849662 

0.2797053915 

0.3818300505 

0.4179591837 

f U4 Uc)2' 5 {7!) 4 

04!) 3 15! 

8 

±0.9602898565 

±0.7966664774 

±0.5255324099 

±0.1834346425 

0.1012285363 

0.2223810345 

0.3137066459 

0.3626837834 

f II6l (c)2 !, (8!) 4 

(16!) 3 17! 


T(f, 2} = f(-l.00000) + f(1.00000) 

= 1.00000+0.33333= 1.33333, 


sa = /(-!) + 4/(0) + /(i) 


1 +2+ f 

3 


Linn. 


The errors are 0.00770, —0.23472, and —0.01250, respectively, so the Gauss-Legendre 
rule is seen to be best. Notice that the Gauss-Legendre rule required only two function 
evaluations and Simpson’s rule required three. In this example the size of the error for 
Gilf) is about 61% of the size of the error for S(f, 1). ■ 

The general A-point Gauss-Legendre rule is exact for polynomial functions of 
degree < 2/V — 1, and the numerical integration formula is 


(16) G N (f) =: UJ/V,l/U/V,l) + WN.lf ( X N.2) + ■ ■ ■ + WN m Nf(XN,N)- 


The abscissas x^,k snd weights WN,k to be used have been tabulated and are easily 
available; Table 7.9 gives the values up to eight points. Also included in the table is 
the form of the error term En{ f) that corresponds to Gjv(/), and it can be used to 
determine the accuracy of the Gauss-Legendre integration formula. 

The values in Table 7.9 in general have no easy representation. This fact makes the 
method less attractive for humans to use when hand calculations are required. But once 
the values are stored in a computer it is easy to call them up when needed. The nodes 
are actually roots of the Legendre polynomials, and the corresponding weights must 
be obtained by solving a system of equations. For the three-point Gauss-Legendre rule 
the nodes are -<0.6) 1/2 , 0, and (0.6) ] f 2 , and the corresponding weights are 5/9, 8/9, 
and 5/9 
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Example 7.19. Use the three-point Gauss-Legendre rule to approximate 


f/i- 


ln(5) - ln(l) 1.609438 


and co mp are the result with Boole's rule B(2) with h = 1. 

Here a = 1 and b = 5, so the rule in (22) yields 

_ 5/(3 - 2(0.6)>/ 2 ) + 8/(3 4- 0) + 5/(3 + 2(0.6) 1 ' 2 ) 

CT3I/1 = W . .. '-jjj- 


.. -+- z.ooooo/ -r i.iwawo 


= 1.602694. 


In Example 7.13 we saw that Boole’s rule gave B(2) = 1.617778. The errors are 
0.006744 and —0.008340, respectively, so that the Gauss-Legendre rule is slightly better 
in this case. Notice that the Gauss-Legendre rule requires three function evaluations and 
Boole’s rule requires five. In this example the size of the two errors is about the same, ■ 

Gauss-Legendre integration formulas are extremely accurate, and they should be 


this case, proceed as follows. Pick a few representative integrals, including some with 
the worst behavior that is likely to occur. Determine the number of sample points 
TV that is needed to obtain the required accuracy. Then fix the value N, and use the 
Gauss-Legendre rule with N sample points for all the integrals. 

For a given value of N, Program 7.7 requires that the abscissas and weights from 
Table 7.9 be saved in 1 x N matrices A and W, respectively. This can be done in 
the MATLAB command window or the matrices can be saved as M-files, It would 
be expedient to save Table 7.9 in a 35 x 2 matrix G. The first column of G would 
contain the abscissas and the second column the corresponding weights. Then, for a 
given value of N, the matrices A and W would be submatrices of G. For example, if 
N = 3, then A=G(3:5,1)' and W=G (3:5,2) J . 

Program 7.7 (Gauss-Legendre Quadrature). To approximate the integral 


f(x)dx % -TT- >N,kf{tNA) 


f{x) at the N unequally spaced points {Lv,i}*Lr The changes of vari- 


a + b b- a , , b-a 

t =-1- x and dt = —-— dx 

2 2 2 

are used. The abscissas and the corresponding weights {uxiliLi must 

be obtained from a table of known values. 

function quad=gauss(f,a,b,A,W) 
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^Input - f is the integrand input as a string , f > 

!* - a and t> are upper and lower limits of integration 

X - A is the 1 x K vector of abscissas from Table 7.9 

X - W is the 1 x N vector of weights from Table 7.9 

'/.Output - quad is the quadrature value 
N*length(A); 

T*zeros(l,N); 

T'~ < (a+b) /2) + ( (b-a) /2) * A; 
quad*((b-a)/2)*sum(VJ. *fevalff .T)) ; 


Exercises for Gauss-Legendre Integration (Optional) 

In Exercises 1 through 4, show that the two integrals are equivalent and calculate Gi{f). 


1. f 6t s dt= f 6(r + 1 ) 5 dx 


2. f sin {t)dt = ( sin(jr + l)dr 


--iK 


3. / A . I dx 4 * / ' e-^dt := ' / 

Jo f J-1 * + 1 v5ir Jo n/2jt J -1 2 

5. ~ jf cos(0.6sin(0)dr = 0.5 J ^ cos (o.6sin ((* + 1)|)) dx 

6. Use Eh (/) in Tabic 7.9 and the change of variable given in Theorem 7.10 to find th^’ 
smallest integer N so that £>(/) == o for 

(a) Jq 8x 7 dx = 256 = G/v(/). 

(b) Jq \ \x { ®dx = 2048 = 

7. Find the roots of the following Legendre polynomials and compare them with the 
abscissa in Table 7.9. 

(a) P 2 (x) = (3x 2 ~\)/2 

(b) P 2 {x) = (5* 3 -3r)/2 

(c) P 4 (x) = (35* 4 - 30x 2 + 3)/8 

8. The truncation error term for the two-point Gauss-Legendre role on the closed in¬ 
terval [—1, 1] is / (4 ) (ci)/ 135. The truncation error for Simpson’s rule on [a, b] is 
—£ 5 / (4> tc 2 }/90. Compare the truncation error terms when [u, b] ~ [-1, i ]. Which 
method do you think is best? Why? 

9. The three-point Gauss-Legendre rule is 

f 1 f(x)dx « +s/w+s/«Q-a)' /2 ) 


Show that the formula is exact for f(x) = 1, x, x 2 , x 3 , x 4 , x 5 . Hint. If f is an odd 
function (i.e„ f(—x) = f(x)) t the integral of / over [—1,1] is zero. 
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10. The truncation error term for the three-point Gauss-Legendre rule on the interval 
[-L l] is / C6) (ci)/15, 750. The truncation error term for Boole’s rule on [a, b] is 
—8/t 7 / (6J (c 2 )/945. Compare the error terms when [a, b] = [—L 1J. Which method 
is better? Why? 

ti. Derive the three-point Gauss-Legendre rule using the following steps. Use the fact 
that the abscissas are the roots of the Legendre polynomial of degree 3, 

xi = -(0.6) l/2 , *2 = 0, *3 = (0.6) l/2 - 


j f(x) dx * y<—(0.6) 1 / 2 ) H- ^2/(0) -h no 3 /(C0.6)^ 2 ) 

is exact for the functions /(*) = 1, and x 2 . Hint , First obtain, and then solve the 
linear system of equations 

LUl + wi + UJ3 = 2 

— (0.6} 1/2 UJ! + (0.6) ,/2 tu 3 = 0 
0.6m i +O.61U3 — 

12. In practice, if many integrals of a similar type are evaluated, a preliminary analysis is 
made to determine the number of function evaluations required to obtain the desired 
accuracy. Suppose that 17 function evaluations are to be made. Compare the Romberg 
answer /?(4, 4) with the Gauss- Legendre answer (717 (/). 


Algorithms and Programs 


1. For each of the integrals in Exercises 1 through 5, use Program 7.7 to find Ge(f)* 
C 7 (f), and Gg(/). 

2. (a) Modify Program 7.7 so that it will compute Gi(/), Giif ),..., Gg(/) and stop 

when the relative error in the approximations Gn~\ if) and G/v (/) is less than 
the preassigneti value toi, trial is 

2 |G„- 1 (/)-Gw(/)i 

|G/v_i(/) + Gnif )\ 


Hint. As discussed at the end of the section, save Table 7.9 in an M-file G as a 
35 x 2 matrix G. 

(b) Use your program from part (a) to approximate the integrals in Exercises 1 
through 5 with an accuracy of five decimal places. 
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3* (a) Use the six-point Gauss-Legendre rule to approximate the solution of the inti 
gral equation 


w(jc) = x 2 +0.1 / (x 2 + 

JQ 

Substitute your approximate solution into the right-hand side of the integral 
equation and simplify. 

(b) Repeat part (a) using an eight-point Gauss-Legendre rule. 





Numerical 

Optimization 


The two-dimensional wave equation is used in mechanical engineering to model vi¬ 
brations in rectangular plates. If the plates have all four edges clamped, the sinusoidal 
vibrations are described with a double Fourier series. Suppose that at a certain instant 



Figure 8.1 (a) The displacement z = fU, y ) of a vibrating plate, (b) The contour plot * 

/(jc, y) = C for a vibrating plate. 
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of time the height z — fix, y) over the point (x, y) is given by the function 

z — f(x, y) = 0.02sin{x) sin(y) — 0.03 sin(2x) sin(y) 

+ 0.04 sinOO siti(2y) 0.08 sin(2x) sin(2y). 

Where are the points of maximum deflection located? Looking at the three-dimei 
sional graph and the companion contour plot in Figure 8.1(a) and (b), respectively, v 
see that there are two local minima and two local maxima over the square 0 < * < i 
0 < v < it. Numerical methods can be used to determine their approximate location 

/(0.8278, 2.3322) = -0.1200 and /(2.535I, 0.6298) = -0.0264 

are the local minima, and 

/(0.9241, 0.7640) = 0.0998 and /(2.3P79, 2.2287) = 0.0853 
are the local maxima. 

In this chapter we give a brief introduction to some of the basic methods for locat 
ing extrema of functions of one or several variables. 

Minimization of a Function 

Definition 8,1 (Local Extremum). The function / is said to have a local minimum 
value at x = p, if there exists an open interval l containing p so that fip) < fix) for 
all x e /. Similarly, / is said to have a local maximum value a tx - p if f(x) < f(p) 
for all x e /. If / has either a local minimum or maximum value at x ~ p, it is said 
to have a local extremum at x — p. * 

Definition 8.2 (Increasing and Decreasing). Assume that f(x) is defined on the 
interval /. 

(i) If xi < implies that f(x\) < fix 2 ) for all x^, X 2 € f, then / is said to be 
increasing on I. 

(ii) If xi < * 2 implies that /(*]) > f(x 2 ) for all x u x 2 £ /, then / is said to be 
decreasing on /. 

Theorem 8.1. Suppose that f (x) is continuous on / = [a, b] and is differentiable on 

(a. b). 

(i) If /'(x) > 0 for all x <= (a, b), then f (x) is increasing on I. 

(ii) If f{x) < 0 for all x € (a, b ), then fix) is decreasing on I. 

Theorem 8.2. Assume that f(x) is defined on l — la, 61 and has a Vocal extremum 
at an interior point p e (a, b ), If fix) is differentiable a tx = p, then f{p) - 0. 
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Theorem 8.3 (First Derivative Test). Assume that /(x) is continuous on I — [&, b J. 
Furthermore, suppose that f(x) is defined for all x e (a, b), except possibly at x - p. 

(i) If f{x) < 0 on (a, p) and f(x) > 0 on (p, b) t then f{p) is a local minimum, 

(ii) If fix) > 0 on (a, p) and f(x) < 0 on (p, b ), then f(p) is a local maximum. 

Theorem 8.4 (Second Derivative Test). Assume that / is continuous on [a, b ] and 
f and f f are defined on (a, b). Also, suppose that p € (a, b) is a critical point where 

f'(p) = 0- 

(i) If fip) > 0, then f{p) is a local minimum of /. 

(ii) If fip) < 0 , then fip) is a local maximum of /. 

(iti) If fip) - 0, then this test is inconclusive. 

Example 8.1. Use the second derivative test to classify the local extrema of fix) = 
* 3 4 - x 2 — x + 1 on the interval [— 2 , 2 ]. 

The first derivative is fix) = 3x 2 4- 2x — 1 = (3x — J)(x +- 1), and the second 
derivative is fix) = 6x + 2. There are two points where fix) = 0 (i.e., x = 1/3,-1). 

Case (i): Atx = 1/3 we find that f(l/3) ^ 0 and /"(i/3) = 4 > 0, so that fix) has 
a local minimum at x = 1/3. 

Case (ii): At x = -1 we find that /'(-1) = 0 and /"(-1) = -4 < 0, so that fix) 
has a local maximum at x = — 1 . ■ 

Search Method 

Another method for finding the minimum of fix) is to evaluate the function many 
times and search for a local minimum. To reduce the number of function evaluations, 
it is important to have a good strategy for determining where fix) is evaluated. One 
of the most efficient methods is called the golden ratio search , which is named for the 
ratio’s involvement in selecting the points. 

The Golden Ratio 

Let the initial interval be [0, I]. If 0.5 < r < 1, then 0 < 1 - r < 0.5 and the interval 
is divided into three subintervals [0, 1 — r], [1 — r,r J, and [r, 1]. A decision process 
is used to either squeeze from the right and get the new interval [ 0 , r ] or squeeze from 
the left and get (1 - r, 1]. Then this new subinterval is divided into three subintervals 
in the same ratio as was [ 0 , 1 ], 

We want to choose r so that one of the old points will be in the correct position 
with respect to the new interval as shown in Figure 8.2. This implies that the ratio 
(lr- r) : r be the same as r : 1 . Hence r satisfies the equation 1 - r — r z , which 
can be expressed as a quadratic equation r 2 + r — 1 =0. The solution r satisfying 
0.5 < r < 1 is found to be r — ^\/5 — 1 ^ / 2 . 

To use the golden search for finding the minimum of fix), a special condition 
must be-roet to ensure that there is a proper minimum in the interval- 
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0 l-r r 1 

E-1-1-3 


0 l-r r 1 

E-1-1-3 


Table 8.1 Secant Method for 
Solving f'(x) — lx- cos(jc) = 0 


E-1- \ -3 

0 l-r r 

Squeeze from the right and 
the new interval is [0, r]. 


l-r 2 

E-1-1-3 

l-r r 1 

Squeeze from the left and 
the new interval is [1 - r. 1]. 


Figure 8.2 The Intervals involved in the golden ratio search. 


k 

Pk 

2 Pk ~ cos ipic ) 

0 

0,0000000 

-1.00000000 

1 

1.0000000 

1.45969769 

2 

0.4065540 

-0.10538092 

3 

0.4465123 

-0.00893398 

4 

0.4502137 

0.00007329 

5 

0.4501836 

-0 00000005 




If /(c) =/(d) then squeeze If f{d) <f(c) then squeeze 

from the right and use [a, d] from the left and use [c, b\ 

Figure 8.3 The decision process for the golden ratio search. 


Ikblle 8.2 Golden Search for the Minimum of /(jc) — x 2 — sin(jc) 


k 

% 

c* 

4 

4 

f(c k ) 

fid k ) 

0 

0.0000000 

0.3819660 

0.6180340 

I 

-0.22684748 

—0.19746793 

1 

0.0000000 

0.2360680 

0.3819660 

0.6180340 

-0.17815339 

-0.22684748 

2 

0.2360680 

0.3819660 

0.4721360 

0.6180340 

—0.22684748 

-0.23187724 

3 

0,3819660 

0.4721360 

0.5278640 

0.61:80340 

-0.23187724 

—0.22504882 

4 

0.3819660 

0.4376941 

0.4721360 

0.5278640 

-0.23227594 

-0.23187724 

5 

0.3819660 

0.4164079 

0.4376941 

0.4721360 

-0.23108238 

-0.23227594 

6 

0.4164079 

0.4376941 

0.4508497 

0.4721360 

—0,23227594 

-0.23246503 

21 

0.4501574 

0.4501730 

0.4501827 

0.4501983 

-0.23246558 

-0.23246558 

22 

0.4501730 

0.4501827 

0.4501886 

0.4501983 

-0.23246558 

-0.23246558 

2:3 

0,4501827 

0.4501886 

0.4501923 

0.4501983 

-0.23246558 

-0.23246558 


D efini tion 8.3 (Unimodal Function), The function f(x) is unimodal on / — [a 6], 
if there exists a unique number pel such that 

{]) f(x ) is decreasing on [a , /?] 

(2) /(jc) is increasing on [p, b\. a 

If f(x) is known to be unimodal on [a, b], it is possible to replace the interval with 
a subinterval on which fix ) takes on its minimum value. The golden search requires, 
that two interior points c = a + (1 — r)(b — a) and d — a + r{b — a) be used, where 
r is the golden ration mentioned! above. This results in a < c < d < b. The condition 
that /(*) is unimodal guarantees that the function values /(c) and f{d) are less than 
ma.x{/ (a), f(b)}. We have two cases to consider {see Figure 8.3). 

If /(c) < / (d), the minimum must occur in the subintervaJ [a, d\ and we replace & 
with d and continue the search in the new subinterval. If f(d) < f(c), the minimum 
must occur in [c, b] and we replace a with c and continue the search. The next example 
compares the root-finding method with the golden search method. 


Example 8.2. Find the minimum of the unimodal function f(x) ~ x 2 - sin(x) on [0, 1], 

Solution by solving fix) = 0. A root-finding method can be used to determine where 
the derivative fix) -= 2x - cos(jc) is zero. Since /'(0) = —1 and /'(l) = 1.4596977, 
a root of fix) lies in the interval [0, 1J. Starting with po - 0 and /»i = 1, Table 8.1 shows 
the iterations. 

The conclusion from applying the secant method is that /'(0.4501836) = 0. The 
Second derivative is fix) = 2 + sin(ac) and we compute /"(0.4501836) = 2.435131 > 0 . 
Hence the minimum value is /(Q.4501836) = -0.2324656. 

Solution using the golden search. At each step, the function values /(c) and f{d) 
are compared and a decision is made as to whether to continue the search in [n, d\ or [c, b]. 
Svme of the computations are shown in Table 8.2. 

At the twenty-third iteration the interval has been narrowed down to [a 23 , 623 ] = 
vfr ,4501827,0.4501983). This interval has width 0.01300156. However, the computed func¬ 
tion values at the end points agree to eight decimal places (i.e., /(a ;J3 ) ^ -0.23246558 ss 
/ hence the algorithm is terminated. A problem in using search methods is that the 
f unction may be flat near the minimum, and this limits the accuracy that can be obtained. 
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The secant method was able to find the more accurate answer p$ - 0.4501836. 

Although the golden search is slower in this example, it has the desirable feature that 
it can be applied in cases where fix ) is not differentiable. $ 


Finding Extreme Values of fix, y) 

Definition 8.1 is easily extended to functions of several variables. Suppose that fix , \) 
is defined in the region 

(3) R = ((x, y) : (x - p) z + (y ~ q ) 2 < r 2 }. 

The function fix, y) has a local minimum at ( p , q) provided that 

(4) fiP-> q) < /(*, y) for each point (x, y) e R. 

The function /(x, y) has a local maximum at ( p, q) provided that 

(5) fix, y) < f(p, q) for each point (x, y) e R. 

The second derivative test for an extreme value is an extension of Theorem 8.4. 

Theorem 8.5 (Second Derivative Test). Assume that fix, y) and its first- and 
second-order partial derivatives are continuous on a region R. Suppose that ip, q) e R 
is a critical point where both f x (p, q) = 0 and f y (p, q) = 0. The higher-order pariial 
derivatives are used to determine the nature of the critical point, 

(1) If fxAp, q)fyy(p , q) ~ flyip, q) > 0 and f xx (p, q) > 0, then fip, q) i> u 
local minimum of /. 

(n) If fxxip, q)fyy(p, q) ~ f xy (p , q) > 0 and f xx {p, q) < 0, then f( P , q) is a 
local maximum of /. 

(iiii) if f xx ip, q)fyy{p* q) — flyip, q) < 0* then /(x, y) does not not have a local 
extremum at ( p, q). 

(iv) If fxxip, q)Jyy{p, q) — f xy { P , q) = 0, this test is inconclusive. 

Example 8.3. Find the minimum of fix, y) = x 2 — 4x + y 1 — y — xy . 

The first-order partial derivatives are 

(fi) f x {x,y) ~2x -A - y and f y {x, y) = 2y - 1 - x. 

Setting these partial derivatives equal to zero yields the linear system 

2x - y 4 
-x +2y = 1. 


The solution to (7) is (x, y) = (3, 2). The second-order partial derivatives of /(x, y) are 
J\x(x, y) = 2, fyyix, y) = 2, and f xy (x, y) = -1. 

It is easy to see that we have case (i) of Theorem 8.5, that is 

fxx (3, 2)/^(3,2) - fly (3.2) = 3 > 0 and /„(3, 2) = 2 > 0. 

Hence f{x, y) has a local minimum /(3, 2) = — 7 at the point (3, 2). 




The Nelder-Mead Method 

A simplex method for finding a local minimum of a function of several variables has 
been devised by Nelder and Mead. For two variables, a simplex is a triangle, and 
the method is a pattern search that compares function values at the three vertices of a 
triangle. The worst vertex, where fix, y) is largest, is rejected and replaced with a new 
vertex. A new triangle is formed and the search is continued. The process generates 
a sequence of triangles (which might have different shapes), for which the function 
values at the vertices get smaller and smaller. The size of the triangles is reduced and 
the coordinates of the minimum point are found. 

The algorithm is stated using the term simplex (a generalized triangle in N di¬ 
mensions) and will find the minimum of a function of N variables. It is effective and 
computationally compact. 


The Initial Triangle BGW 

Let f{x, y) be the function that is to be minimized. To start, we are given three vertices 
of a triangle: Vi — (xi, yi), k = 1, 2, 3. The function fix , y) is then evaluated at each 
of the three points Zk — / ( x k> yk ) for k — V 2, 3. The subscripts are then reordered so 
that z\ < Z2 S Z 3 . We use the notation 

(8) B = (xi, yD, G = (x 2 , >- 2 ), and W - (x 3 , y 3 ) 

to help remember that B is the best vertex, G is good (next to best), and W is the worst 
vertex. 


Midpoint of the Good Side 

"Hie construction process uses the midpoint of the line segment joining B and G . It is 
found by averaging the coordinates: 


(9) 


M = 


B + G (x\+xi yi+y 2 N 


-( 


(7) 


2 


2 


2 
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B 



G 


Figure 8.5 The triangle &BG W and point J? and extended poini E. 

Reflection Using the Point R 

The function decreases as we move along the side of the triangle from W to 8, and ii 
decreases as we move along the side from W to G. Hence it is feasible that f(x, v) 
takes on smaller values at points that lie away from W on the opposite side of the line 
between B and G W e choose a test point R that is obtained by “reflecting” the triangle 
through the side EG. To determine R, we first find the midpoint M of the side ~BG. 
Then draw the line segment from TT to M and call its length d. This last segment is 
extended a distance d through M to locate the point R (see Figure 8.4), The vector 
formula for R is 

(10) R = M + (M~ W) = 2M- W 


Expansion Using the Point E 

If the function value at R is smaller than the function value at W t then we have moved 
in the correct direction toward the minimum. Perhaps the minimum is just a bit farther 
than the point /?. So we extend the line segment through M and R to the point E 
This forms an expanded triangle BGE. The point E is found by moving an additior o 
distance d along the line joining M and R (see Figure 8.5). If the function value at E 
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Contraction Using the Point C 

If the function values at R and W are the same, another point must be tested. Perhaps 
the function is smaller at Af, but we cannot replace W with M becaus e we m ust have 
a triangle. Consider the two midpoints Ci and Ci of the line segments WM and ~MR, 
respectively (see Figure 8,6). The point with the smaller Function value is called C, 
and the new triangle is BGC. Note: the choice between C\ and C 2 might seem 
inappropriate for the two-dimensional case, but it is important in higher dimensions. 


Shrink toward B 

If the function value at C is not less than the value at IT, the points G and W must be 
shrunk toward B (see Figure 8.7). The point G is replaced with Af, and W is replaced 
with S, which is the midpoint of the line segment joining B with W, 
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Table 8.3 Logical Decisions for the Nelder-Mead Algorithm 


IF f(R) < /(G), THEN Perform Case (i) {either reflect or extend) 
ELSE Perform Case (ii) {either contract or shrink] 


BEGIN {Case (i).} 

BEGIN {Case (ii).} 

IF f{B) < /(/?) THEN 

IF /(/?) < f(W) THEN 


replace W with R 

| , replace W with R 


ELSE 

Compute C = (W + M)/2 



or C = (M + R)/2 and /(C) 


Compute E and f(E ) i 

IF /(C) < /(W) THEN 


IF /(£) < /(B) THEN ; 


replace W with C 


replace IV with E 


ELSE 


ELSE 


Compute S and f(S) 


replace W with R 


replace W with 5 


END IF 


replace G with M 

END IF 

1ENDIF 

END {Case (i).J 

END (Case (ii).} 


Logical Decisions For Each Step 

A computationally efficient algorithm should perform function evaluations only i: 
needed. In each step, a new vertex is found, which replaces W. As soon as it i ■ 
found, further investigation is not needed, and the iteration step is completed. The 
logical details for two-dimensional cases are explained in Table 8.3. 

Example 8.4. Use the Nelder-Mead algorithm to find the minimum of /(x, y) = t 2 - 
4 x 4 - v 2 - y - xy. Start with the three vertices 

Vi = {0, 0), v 2 - ( 1 . 2 , 0.0), V 3 - ( 0 . 0 , 0 . 8 ). 

The function f(x,y) takes on the values 

/(0.0) = 0.0. /(1.2, 0.0) = -3.36, /(0.0, 0.8) = -0.16. 

The function values must be compared to determine B , G, arid W; 

B = (1.2. 0.0), G = (0.0,0.8), W = (0,0). 

The vertex W - (0,0) will be replaced. The points M and R are 

M = = (0.6, 0.4) and R — 2M - W = (1.2, 0.8). 

2 

The function value /(/t) = /(1.2, 0.8) = -4.48 is less than /(G), so the situation 
case (i). Since /(£) < f{B ), we have moved in the right direction, and the vertex E must 
be constructed: 


y 



Figure 8.8 The sequence of triangles {7*} converging to the point (3, 2) for the 
Nelder-Mead method. 


The function value /(£) = /( 1.8,1,2) = —5.88 is less than f{B), and the new triangle 
has vertices 

Vi = (1.8, 1.2), V *2 = (1.2, 0.0), V 3 = ( 0 . 0 ,0.8). 

The process continues and generates a sequence of triangles that converges down on the 
solution point (3,2) (see Figure 8 . 8 ). Table 8.4 gives the function values at vertices of the 
triangle for several steps in the iteration. A computer implementation of the algorithm con¬ 
tinued until the thirty-third step, where the best vertex was B — (2.99996456, 1.99983839) 
and f(B) = —6.99999998. These values are approximations to /(3, 2 ) — —7 found in 
Example 8.3. The reason that the iteration quit before ( 3 , 2 ) was obtained is that the func¬ 
tion is flat near the minimum. The function values /{£), f(G), and f(W ) were checked 
and found to be the same (this is an example of round-off error), and the algorithm was 
terminated, ■ 

Minimization Using Derivatives 

Uqppose that f(x) is unimodal over [a, b] and has a unique minimum at x = p. Also, 
assume that /'(x) is defined at all points in (a, b). Let the starting value po he in 
(a, b). If f\p{)) < 0, the minimum point p lies to the right of po. If f'(pu) > 0, p 
lies to the left of po (see Figure 8.9). 


£ = 2R - M = 2(1.2,0.8) - (0.6,0.4) = (1.8, 1.2). 
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Table 8.4 Function Values at Various mangles for example 8.4 


* 

Best point 

Good point 

Worst point 


T 

/(1.2,0.0) = -3.36 

/(0.0,0,8) = -0.16 

/(0.0,0.0) = 

000 

2 

/(l.8, 1.2) = -5.88 

/(1.2,0.0)= -3.36 

/(0-0,0.8) = - 

0.16 

3 

/(1.8, 1.2) = -5.88 

/(3.0,0.4) = - 4.44 

/(l. 2 , 0.0)= - 

3.36 

4 

/(3.6, 1.6) = -6.24 

/(/.8,1.2)= -5.88 

/(3.0, 0.4) = - 

4.44 

5 

/(3.6, 1.6) = -6.24 

/(2.4, 2.4) = - 6.24 

/(l.8,1.2)= - 

5.88 

6 

/ (2.4, 1.6) = -6.72 

/(3.6,1.6)= -6.24 

/(2.4, 2.4) = - 

6.24 

7 

_/ (3-0, 1.8) = -6.96 

/(2.4,1.6) = -6.72 

/(2.4, 2.4) = - 

6,24 

8 

y (3.0, 1.8) = -6.96 

y (2,55, 2.05) = - 6.7725 

/ (2.4,1.6)= - 

6.72 

9 

/(3.0, 1.8) = -6.96 

/(3.15,2.25)= -6.9525 

/(2.55, 2.05) = - 

6.7725 

10 

y (3.0,1.8) --6.96 

/(2.8125, 2.0375)= -6.95640625 

/(3.15,2.25)= - 

6.9525 



a p 0 p b a p p Q b 


If f'(p 0 ) < 0 then If f'ip 0 ) > 0 then 

p lies in [p 0 , b]. p lies :in [a, p 0 ]. 

Figure 8.9 Using fix) to find the minimum value of the unimodal func¬ 
tion /(*) on the interval [a. b]. 

Bracketing the Minimum 

Our first task is to obtain three test values. 

(12) po, pi = po + h, and pz = Po + 2h t 
so that 

(13) /(/jo) > /(Pi) and f{p\) < /(p 2 ). 

Suppose that /'(po) < 0; then po < p and the step size h should be chosen position 
It is an easy task to find a value for h so that the three points in (12) satisfy (1.3). Start 
with h = 1 in formula (12) (provided that a + 1 < b). 

Case (i): If (13) is satisfied, we are done. 

Case (ii): If /(po) > /(pi) and f(pi) > /(p 2 ), then p 2 < p. We need 
to check points that lie farther to the righ t. Double the step size ant! 
repeat the process. 
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Case (m): if /(po) < /(pO, we have jumped over p and h is too large. We need 
to check values closer to p 0 . Reduce the step size by a factor of i and 
repeat the process. 2 

When /'(p 0 ) > 0, the step size h should be chosen negative and then cases similar 
to (1) to (in) can be used. 


Quadratic Approximation to Find p 

Finally, we have three points (12) that satisfy (13). We will use quadratic mterpol.im i 

to nd Pmin. which is an approximation to p. The Lagrange polynomial based on the 
nodes in (12) is 

(14) ~~ P^ x ~ P 2 ) yi(x - po)(x - p 2 ) , ya(jr - pn)(x - ot) 

2k 2 h 2 + - 2hi -- 

The derivative of Q ( x ) is 

(15) Q'(x) = — ~ X ~ Pl ~ p2) - • > ’ 1 (2x ~ P” ~ P2) , >'2 <2x - p 0 - pi } 

2h 2 h 2 2h 2 

Solving Q\x) = 0 in the fonn 0'(p o + h^) = 0 yields 

o = yoC^Al+AnirO-pt - pi) _ Vl(4(po + Zimin) - 2pp - 2p 2 ) 

(16) 2hl 2h 2 

+ y2(2(pp + /imin) - po - pi) 


Multiply each term in (16) by 2 h z and collect terms involving ht^: 

—kmin(2yo ~ 4yi + 2y 2 ) = yo(2po - Pl — P2> 

- yt(4po - 2po - 2p 2 ) + y 2 (2po - p 0 - p 
= yo(-3fi) - yi(~4h) + y 2 (-k). 


This last quantity is easily solved for h m i n : 

ft7) j. _ k{ 4 y\ - 3 yo - yi) 

4y l -2yo- 2y 2 

The value pmin — po + Zimin is a better approximation to p than pq. Hence we 
can replace po with Pmm and repeat the two processes outlined above to determine a 
new h and a new ft min- Continue the iteration until the desired accuracy is achieved.» 
The details are outlined in Program 8.3. 
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S teepest Descent or Gradient Method 

Now let us turn to the minimization of a function f(X) of N variables, where X = 
(*,, x 2 . xn). The gradient of f(X) is a vector function defined as follows 

(18) grad/CX) = (/i,/2, /w), 

where the partial derivatives /* = df/dxk are evaluated at X. 

Recall that tide gradient vector (18) points locally in the direction of the greatest rate 
of increase of / iX). Hence — grad f(X) points locally in the direction of the greatest 
decrease. Start at the point Pq and search along the line through Pa in the direction 
S 0 = -G/ l|G||, where G — grad f(Pu). You will arrive at a point Pi. where a local 
minimum occurs when the point X is constrained to lie on the line X — Po + rSa- 
Next, we can compute G - grad f(P\) and move in the search direction Si = 
-Gt |: G ||. You will come to P2, where a local minimum occurs when X is constrained 
to lie on the line X — P] + rS|. Iteration will produce a sequence {P*} of points wkh 
the property /(P 0 ) > /(Pi) > **■ > f(Pk) > • ■ ■. If Jim*-,* P* = P, then/(f) 
will be a local ntinimum for f(X). 

O utline of the Gradient Method 
Suppose that P* has been obtained. 

Step l. Evaluate the gradient vector G — grad /(P*). 

Step 2. Compute the search direction S — —G/ HG |]. 

Step 3. Perform a single parameter minimization of '$(/) = /(P* + t S) on tt* 
interval [0. b] t where b is large. This will produce a value ( == h^ where 
a local minimum for d>(0 occurs. The relation <I> (Amin) =/(Pjc+^nun^l 
shows that this is a minimum tor / (A ) along the search iine A’ = Pk" 

AfindfiS. 

Step 4. Construct the next point P*~ 1 = P* + h, mn S. 

Step 5. Perform the termination test for minimization:, that is, are the function val¬ 
ues f(Pk) and /(P*4-i) sufficiently close and thedistance (|P* + i - AilJ 
small enough? 

Repeat the process. 

Program 8.1 (Golden Search for a Minimum). To numerically approximate the 
minimum of fix) on the interval [ a , b] by using a golden search. Proceed with the, 
method only if f{x) is a unimodal function on the interval fa, b\. 

function [S, E, G] =golden (f, a, b, delta * epsilon) 

'/Input - f is the object function input as a string ’f 1 
y - a and b are the end points of the interval 

'/, - delta is the tolerance for the abscissas 
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% - epsilon is the tolerance for the ordinates 

'/Output - S»(p,yp) contains the abscissa p and 

% the ordinate yp of the minimum 

‘A - E«(dp,dy) contains the error bounds for p and yp 

% - 6 is an n x 4 matrix: the kth row contains 

% [ak ck dk bk) ; the values of a, c, d, and b at the 

% kith iteration 

ri*(sqrt(B)-l)/2; 

h-b-a; 

yfc*feval(f,a); 
yb*feval(f ,b); 

C®a+r2*h; 
d^a+rl+h; 
yc-feval(f*c); 
yd-feval(f ,d ); 
lf*l; 

A(k)=a;B<k)«b;C(k)«c;D(k)*d; 

while(abs(yb-ya)>epsilon)I(h>delta) 
k=k+l; 
if(yc<yd) 
b“d; 
yb*yd; 
d=c; 
yd=yc; 
h-=b-a; 
c=a+r2*h; 
yc=feval(f,c); 
else 
a=c; 
ya=yc, 
c-d; 
yc=yd; 
h-b-a; 
d-a+rl*h; 
yd«feval(f ,d) ■, 
end 

A(k)-a;B(k)=b;C(k)-crD(k)=d; 

dpabs(b-a); 
d'^abs (yb-ya); 

f~‘ a; 
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yp=ya; 

if(yb<ya) 
p=b; 
yp=ybj 

end 

G=[A J C J D' B’]; 

S=[p yp] ; 

E =[dp dy]; 

Programs 8,2 and 8,4 require that the object function f be saved as an M-file. Th 
argument of f needs to be a 1 x n array. To illustrate consider saving the function in 
Example 8.3 as an M-file: 

function z*f(V) 
z=0; x=V(l); y=V(2); 
z=x. ~2-4x+y.“2-y-x,*y; 


Program 8.2 (Nelder-Mead’s Minimization Method). To approximate a local 
minimum of f(x\, x*i ,..., x,y), where f is a continuous function of N real vari¬ 
ables, and given the N + 1 initial starting points V* = (v k i,..., u* s ) for k = 0 
1 . N. _ 

function[VO,yO,dV,dy]=nelder(F t V,mini,maxi,epsilon,show) 

'/♦Input - F is the object function input as a string ’F* 

7* - V is a 3 x n matrix containing starting simplex 

’/. - mini A maxi are minimum and maximum number 

V. of iterations 

% - epsilon is the tolerance 

% - show «** 1 displays iterations (P and Q) 

JiQutput - VO is the vertex for the minimum 

% - yO is the function value F(VO) 

'/. - dV is the size of the final simplex 

% - dy is the error bound for the minimum 

*/, - P is a matrix containing the vertex iterations 

*/* - Q is an array containing the iterations for F(r, 

if nargin==5, 
show=0; 

end 

[mm n]=size{V); 

! t Order the vertices 
for j=l:n+l 
Z=V(j,1:n); 

Y( j)=feval(F,Z); 


end 

tm lo]*miEi(Y); 

[nun hi]=max(Y); 
li»hi; 
ho=lo; 

for j*l:n+l 

if(j~=lofcj—hi*Y(j><-Yfli)) 
ii=j; 

end 

if C j ~=h:L& j ~=lo&Y(j) >=Y Cho)) 
ho=j; 

end 

end 

cnt=0; 

'/, Start of Nelder-Mead algorithm 
while(YChi)>Y(lo)+epsilon&cnt<maxl)icnt<minl 
S=zeros(l,1:n); 
for j=l:n+l 

S=S+V(j,1:n); 

end 

M=(S-V(hi,l:n))/n; 

R=2*M-V Chi,1:n); 
yR=feval(F,R); 
ifCyR<Y(ho)) 
if(Y(li)<yR) 

V(hi,l:n)=R; 

Y(M)=yR; 

else 

E=2*R-M; 
yE“feval(F,E) ; 

±fCyE<Y(li)) 

V(hi, 1 :n)=E; 

Y(hi)=yE; 

else 

V(hi,1:n)=R; 

Y(hi)-yR; 

en<i 

end 

else 

if(yR<Y(hi)) 

V(hi,1:n)=R; 

Y(hi)=yR; 
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end 

OCV(hi,l:n)+M)/2; 
yOfeval(F,C); 

C2=(M+R)/2; 
yC2=feval(F,C2); 
if(yC2<yC) 

C=C2; 
yC=yC2; 

end 

if CyC<Y(hi)> 

V(hi,l:n)»C; 

Y(hi)=yC; 

else 

for j=l:n+;L 

if 

VCj * 1 :n)=(VCj , 1 :;n)+V(lo,l:n))/2; 
Z=V(;j ,l:n); 

Y Cj ) «feval (F, Z) 

end 

end 

end 

end 

[mm lo]=min(Y); 

[mm hi]=max(Y); 

li-hi; 

ho=lo; 

for j==l:n+l 

if (j^lo&jMftY(j)<=Y(li)) 

u=j; 

end 

if (j -=hi&j —1 oft Y ( j ) >=Y Cho)) 
ho=j ; 

end 

end 

cut-Cut+ij 
P(cnt,:)=V(lo,:): 

Q(cnt) j =Y(lo); 

end 

% End of Nelder-Mead algorithm 

’/.Determine size of simplex 

snorm=0; 

for j=l:n+l 

s-norm(V(j)-V(lo)); 


if (s>«snorm) 
snorm=s; 

end 

end 

V0»V(lo,l:n); 
yO=Y Clo) ; 
dV=snont; 

dy=abs(Y (hi)-Y(lo)) ; 
if (show=»=l) 
disp(F); 
disp(Q); 

end 


Program 8,3 (Local Minimum Search Using Quadratic Interpolation). To find 
a local minimum of the function f (x) over the interval in t hi by starting with one 
initial approximation po and then searching the intervals [a, po] and [ po , b], 

function [p, yp, dp, dy ,P] =quadmin(f, a ,b, delta, epsilon) 

*/, Input - 

% 
l 
t 

“/.Output - 

l 
% 

% 

% 

pO=a; 
maxj=20; 
maxk= 30; 
big^le6; 
err=l; 
k-1; 

P(k)=p0; 
cond=Q; 
h=l; 

if (abs(p0)>le4),h=abs(p0)/ie4;end 
while (k<maxk&err>eps ilon&cond^-S) 

f1=(feval(f,p0+0.00001)-f eval(f,pO-0.00001))/0.00002; 
if(f1>0),h=-abs(b);end 
pl=pO+h; 
p2=p0+2*h; 


f is the object function input as a string , i i 
a and b are the end points of the interval 
delta is the tolerance for the abscissas 
epsilon is the tolerance for the ordinates 
p is the abscissa of the minimum 
yp is the ordinate of the minimum 

dp is the error bound for p 

dy is the error bound for vp 

P is the vector of iterations 
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pmin»pD 

yO*feval(f,p0 ); 

yl=feval(f ,pl) ; 

y2=feval(f,p2 ); 

yaiin=yO; 

cbnd=0; 

j=0; 

^Determine h so thaLt yl<y0feyl<y2 
while C i <maxi ftabs f h.) >deltafecond== 
if (yOOyl), 
p2=pl; 

y2-yi; 

h“h/2; 
pl"pO+h; 
yl-fevalCf,pl); 
else 

if(y2<yl), 
pl=p2; 
yi-y2; 
h=2+h; 
p2=p0+2*b; 
y2»fevalCf ,p2); 
else 

cond=-l; 

end 

end 

if (abs(h)>bigiaba(pO)>big) ( cond=5; end 
end 

if(cond—5), 
pmin^pl; 

ymin-fevaltf ,pl )\ 
else 

*/, Quadratic: interpolation to find y] 
d«4*yl-2*y0~2*y2; 
if(d<0), 

hmin-h*(4*yl-3*y0-y2)/d; 
else 

hmin-h/3; 
eond-4; 
end 

pain=ipO+haiin; 
ymin=feval(f,pmin); 


h»abs fh) ; 
bO=absOmin) ; 
hl=absChmin-h); 
h2=abs(hmin-2*h)■ 

*/,Det ermine magnitude oi next h 

ifCtO<h),h“hO;end 

if CbKh) ,h=hl; end 

if(h2<h),h=h2;end 

if (1 l=== 0) ,h=hmin; end 

if (li<delta) , cond=l ;end 

if (abs(h)>big!abs £pmin)>big),cond-b;end 

^Termination test for minimization 

eO“abs CyO-ymin); 

el=abs Cyl-ymin); 

e2=abs (y2-ymin); 

ifCeO'-'-O k eO<err), err=eO;end 

if£el~-"Q k el<err) . err=el :end 

if(e2^*0 & 2<err) ,err*e2;end 

if(e0-'-=0 k el= a 0 k e2==0) t error*0; end 

if(err<epsilon3,cond-2jend 

pO=pmin; 

k-k+1; 

P(k)«pO; 

end 

if(cond==2&h<delta>,cond^3;end 

end 
p=pO; 
d.p=h; 

yp*fevalCf,p); 
dy-err; 


Program 8.4 requires that the object function f be saved as an M-fiie. Additionally, 
the search direction - grad // |;grad /1| needs to be saved as an M-file. To illustrate, 
consider the function / from Example 8.3, where the gradient of / is (2x —4 — y, 2y~ 
1 - r). An appropriate M-file for this panicular function / is 

function z=G(V) 
z-zerosd ,2; ; 
xfVCi) jy=VC2)j 
g*[2x-4-y 2*y-l~x]; 
z—(l/norm(g))*g; 
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Program 8.4 (Steepest Descent or Gradient Method), To numerically approxi¬ 
mate a local minimum of f{X), where / is a continuous function of N real variables 
and x _ ( XU X 2 ,. - -, xt/), by starting witli one point Pq and using the gradient 

method.____ 

fiction [PC^O,err]=grads(F,G f PO,maxl f delta,epsilon,show) 

'/.Incut - F is the object function input as a string ’F’ 

y t - G =-( 1/norm (grad F))*grad F; the search direction 

% input as a string f G’ 

y t - po is the initial starting point 

“/, - maxi is the maximum number of iterations 

y t - delta is the tolerance for hmin in the single 

£ parameter minimization in the search direction 

% - epsilon is the tolerance for the error in yO 

y t - show; if show= : =l the iterations are displayed 

‘/.Output - PO is the point for the minimum 

y t - yO is the function value t CfO) 

v - err is the error bound for yO 

y^ - P is a vector containing the iterations 

if nargin==5,show=0;end 
[mm n] =1 size (PO); 
maxj-10; big= s le8; h«l; 

P=zeros(maxj Jh n+1) ; 
len^normCPO) ;; 
yO=feval(F,P0); 
if (len>e4),h=len/le4;end 
err-i;cnt=0;cond=0; 

PCcnt+l. :) = [P0 yO] ; 

while (cnt<maxl&cond~=58: (h>delta I err>epsilon)) 
c /,Compute search direction 
S=feval(G,P0); 

'/Start single parameter quadratic minimization 
Pl-P0+h*S; 

P2=P0+2*h+S; 
yl=feval(F,PI); 
y2=feval(F,P2); 
cond=0;j=0; 
while(j <maxj&cond==0) 
len=norm(P0); 
if (y0<yl) 

P2=P1; 
y2=yl; 
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h=h/2; 

Pl=P0+h*S; 
yl=feval(F,P1) ; 
else 

if(y2<yl) 

Pl=-P2; 

yi-y2; 

h=2*h; 

P2«P0+2*h*3; 
y2~feval(F,P2); 
else 

cond=-1; 

end 

end 

if(h<delta),cond=l;end 

if(abs(h)>bigIlen>big),cond=5;end 

@nd 

if(cond==5) 

Pmin=Pl; 

ymin=yi; 

else 

d=4*yl-2*y0-2*y2; 
if(d<0) 

hmin"h*(4*vi-3*y0-y2)/d; 
else 

cond=4; 
hmin=h/3; 
end 

'/.Construct the next point 
Pmin=PO+hmin*S; 
ymin=feval(F.Pmin); 

'/Determine magnitude of next h 

h0=abs(hmin); 

hl=abs(hmin-h); 

h2=abs(hmin-2*h); 

if(h0<h),h=hO;end 

if(hl<h),h=hi;end 

if(h2<h),h-h2;end 

if(h==0),h=hmin;end 

if(h<delta).cond=l;end 
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^Termination test for minimization 
eO=abs(yO-ymin); 
el=abs(yl-ymin); 
e2=abs(y2-ymin); 
if (eO'^=OA:eO<err) , err-eO; end 
if (el~-=0&el<err) ,err=el;end 
if (e2~=0A:e2<err), err=e2; end 
if(e0==0&el==0&e2==0),err=0;end 
if (err<epsilon) , cond==2; end 
if (cond==2&h<delta),cond=3;end 

end 

cnt=cnt+l; 

PCcnt+1, :) = [Pmin ymin] ; 

P0=Pmin; 

y()=ymin; 

end 

if(show=-l) 
disp(r) ; 

end 


Exercises for Minimization of a Function 


1. Use Theorem 8.1 to determine where each of the following functions is increasing 
and where it is decreasing. 

(a) f{x) == 2x 3 - 9x 2 + 12jc — 5 

(b) f(x)=x/(x + l) 

(c) fix) =-- (X + 1)/Jt 

(d) f{x)=x* 

2. Use Definition 8.3 to show that the following functions tire unimodal on the given 
intervals. 

(a) /(*)« 2 -2jt + l;[0,4] 

(b) f(x) —■ cos(r); [0, 3j 

(c) /(x) = x*; [1, 10] 

(d) /(z) = -x(3-jr) 3 /3;[o,3] 

3. Use Theorems 8.3 and 8.4, iif possible, to find all local minima and maxima of each 
of the following functions on the given interval. 

(a) /(r) =4r 3 -8r 2 - llr + 5; [0,2] 

(b) f(x) ~ x + 3/x 2 ; [0.5, 3] 

(c) fix) = (x + 2.5)/(4 - x 2 )- [—1.9, 1.9] 

(d) fix) = e x /x 2 ; [0.5, 3] 
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(e) fix) ■-= - sin(jr) - sin(3r)/3; [0, 2] 

(f) fix) == -2 sin(jr) + sin(2r) - 2 sin(3jt)/3; [1, 3] 

4. Find the point on the parabola y = x 2 that is closest to the point (3,1). 

5. Find the point on the curve y = sin<jr) that is closest to the point (2,1), 

6. Find the point(s) on the circle x 2 + y 2 — 25 that is farthest from the chord A B if 
A = (3,4) tuid B = i~l,V24). 

1, Use Theorem 8.5 to find the local minimum of each of the following functions. 

(a) /(*, y) = x 3 + y 3 - 3* - 3y + 5 

(b) /(*, y) = * 2 + y 2 + * -2y - xy + 1 

(c) fix,y) = x 2 y + xy 2 -3xy 

(d) f(x i y) = ix-y)/ix 1 + y 2 + 2) 

(e) f(x, y) - 100(y - x 2 ) 2 + (1 - jc) 2 
(Rosenbrock’s parabolic valley, circa I960) 

S. Let B — (2, -3), G = (1, 1), and TV = (5,2). Find the points M, /?, and E and 
sketch the triangles that are involved. 

9. Let B = (-1, 2), G = (-2, -5), and TV = (3, 1). Find the points M, R. and E and 
sketch the triangles that are involved. 

10. Give a vector proof that M = (5 + G)/2 is the midpoint of the line segment joining 
the points B and G. 

13. Give a vector proof of equation (10). 

12. Give a vector proof of equation (11). 

13. Give a vector proof that the medians of any triangle intersect at a point that is two- 
thirds of the distance from each vertex to the midpoint of the opposite side. 

14. Let B = (0, 0,0), G = (1, 1,0), P = (0, 0, 1), and W = (1,0,0). 

(a) Sketch the tetrahedron BGPW. 

(b) Find M = iB + G + P)/3. 

(c) Find R = 2M — TV and sketch the tetrahedron BGPR. 

(d) Find E = 2R — M and sketch the tetrahedron BGPE. 

15. Let B - (0,0,0), G - (0,2,0), P = (0, 1, 1), and TV = (2,1,0). Follow the 
instructions in Exercise 14, 


Algorithms and Programs 


1. Use Program 8.1 to find the local minimum of each of the functions in Exercise 3 
with an accuracy of eight decimal places. 

2. Use Program 8.3 to find the local minimum of each of the functions in Exercise 3 with 
an accuracy of eight decimal places. Start with the midpoint of the given interval. 
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3. Use Program 8,2 to find the minimum of each of the functions in Exercise 7 with :x. 
accuracy of eight decimal places. Use the following starting vertices: 

(a) (1,2), (2,0), and (2, 2) 

(b) (0,0), (2,0), and (2,1) 

(c) (0,0), (2,0), and (2, 1) 

(d) CO, 0), (0, 1), and (1,1) 

(e) (0,0), (1,0), and (0,2) 

4. Use Program 8.4 to find the minimum of each of the functions in Exercise 7 with an 
accuracy of eight decimal places. Use the starting vertex: 

(a) (1,2) (b) (0,0.3) (c) (0.1,0.1) 

(d) (0.5,0.11) (e) (0,0) 

5. In Program 8.4 the x and y coordinates of the iterations are stored in the first two 
columns of the matrix P, respectively. Modify Program 8.4 so that it will plot the x 
and y coordinates of Ethe iterations on the same coordinate system. Hint Incorporate 
the command plot (P (.-, 1) , P (:, 2), 1 . ’) into your program. Use this program on 
the functions in Exercise 7. 

6. Use Program 8.2 to find the local minimum of each of the following functions: with 
an accuracy of eight decimal places. 

(a) / (x, y, z) = 2x 2 + 2y 2 + z 2 - 2xy + yz-ly -4z 
Start with (1,1,1), (0,1,0), (1,0, 1), and (0,0, 1). 

(b) f{x,y, z,u) = 2(x 2 + y 2 + z 2 + u 2 )-x(y + z - u) + yz - 3jc - 8y - 5z - 9u 
Start the search near (1,1,1,1). 

(«:) f{x, y, z,u) =xyzu + - + - + - + - 
x y z u 

Start die search near (0.7, 0.7, 0.7, 0.7). 

7. Use Program 8.4 to find the local minimum of each of the functions in Problem 6. 
Use a starting value near one of the given vertices. 

8. Use Program 8,1 and/or 8.3 to find all local maxima and minima of the following 
function in the interval [0, 2J. 

x 3 +x 2 -12x-12 
~ 2* 6 - 3x 5 - 4x 4 + + 12* - 18 

9. Find the point on the surface z — x 2 4- y 2 that is closes! to the point (2, 3, l). 

10. A company has five factories A, B, C, D, and E, located at the points {10, 10.'), 
(30,50), (16.667,29), (0.555,29.888), and (22.2221,49.988), respectively, in the 
xy-plane. Assume that the distance between two points represents the driving diis 
tance, in miles, between the factories. The company plans to build a warehotjo a 
some point in the plane. It is anticipated that during an average week there will be 
] 8,20, 14, and 25 deliveries made to factories A, B, C, D, and E, respectively. Idea iy, 
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to minimize the weekly mileage of delivery vehicles, where should the warehouse be 
located? 

11. In Problem 10, where should the warehouse be located if, due to zoning restrictions, 
it must be located at a point on the curve y = x 2 ? 




Solution of Differential Equations 


Differential equations are commonly used for mathematical modeling in science and 
engineering. Often there is no known analytic solution and numerical approximation > 
are required. As an illustration, we consider population dynamics and a nonlinear 
system that is a modification of the Lotka-Voltenra equations: 

x r = /(/, x, y)~x-xy- — .r 2 and / = g(t. x, y) = xy - y - -~y 2 , 

with the initial condition x{0) — 2 and y(0) == 1 for 0 < f < 30. Although the 
numerical solution is a list of numbers, it is helpful to plot the jpolygonal path joining 
the approximation points { (x k , y k )} and plot the trajectory shown in Figure 9.1. In tin 
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C = 2 c * 1 C = 0 



Figure 9.2 The solution curves y(f) = t + e T +C. 


chapter we present the standard methods for solving ordinary differential equations, 
systems of differential equations, and boundary value problems. 



where C isJhe constant of integration. All the functions in (2) are solutions of (1) 
because they satisfy the requirement that y’{t) = 1 - e~ ( . They form the family of 
curves in Figure 9.2. 

Integration was the technique used to find the explicit formula for the functions 
in (2), and Figure 9 .2 emphasizes that there is one degree of freedom involved in the 
solution, that is, the constant of integration C. By varying the value of C, we “move the 
Solution curve” up or down, and a particular curve can be found that will pass through 
any desired point. The secrets of the world are seldom observed as explicit formulas. 
Instead, we usually measures how a change in one variable affects another variable. 
When this is translated into a mathematical model, the result is an equation involving 
the rate of change of the unknown function and the independent and/or dependent 
variable- 
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Consider the temperature y (f) of a cooling object. It might be conjectured that the 
rate of change of the temperature of the body is related to the temperature difference 
between its temperature and that of the surrounding medium. Experimental evidence 
vermes this conjecture, Newton’s law of cooling asserts that the rate of change is 
directly proportional to the difference in these temperatures. If A is the temperature of 
the surrounding medium and y (t) is the temperature of the body at time t, then 

dy 

(3) f t =-k{y-A), 

where k is a positive constant. The negative sign is required because dy fdt will be neg¬ 
ative when the temperature of the body is greater than the temperature of the medium. 

If the temperature of the object is known at time t = 0, we call this an initial 
condition and include this information in the statement of the problem. Usually, we 
are asked to solve 

(4) — — -kiy ~ A) with y(0) = y 0 . 

at 

The technique of separation of variables can be used to find the solution 

(5) y = A + (y 0 - A)e~ kt . 

For each choice of yo, the solution curve will be different, and there is no simple 
way to move one curve around to get another one. The initial value is a point where the 
desired solution is “nailed down.” Several solution curves are shown in Figure 9,3, and 
it can be observed that as t gets large the temperature of the object approaches room 
temperature. If yo < A, the body is warming instead of cooling. 


Initial Value Problem 

Definition 9.1. A solution to title initial value problem (LV.P.) 

(6) y' - f(t, y) with yto) = yo 
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y 



1 2 3 4 5 (r-y)/2 


on an interval to. is a differentiable function y = y(t) such that 

{7} y{£ 0 ) = y 0 and y'(f) = fit, y(t)) for all t € to. H A 

Notice that the solution curve y = y(r) must pass through the initial point Oo- yo)- 


Geometric Interpretation 

At each point (t , y) in the rectangular region R = {(f, y) : a < t < b, c < y < d}, 
the slope of a solution curve y = y (f) can be found using the implicit formula m = 
fit, y(f)). Hence the values mij = fin, yj) can be computed throughout the rectan¬ 
gle, and each value mij represents the slope of the line tangent to a solution curve that 
passes through the point yf). 

A slope field or direction field is a graph that indicates the slopes {m;j} over the 
legion. It can tie used to visualize how a solution curve “fits” the slope constraint. To 
move along a solution curve, one must start at the initial point and check the slope 
field to determine in which direction to move. Then take a small step from to to to + h 
horizontally and move the appropriate vertical distance hf {to, yo) so that the resulting 
displacement has the required slope. The next point on the solution curve is (ti. yi). 
Repeat the process to continue your journey along the curve. Since a finite number of 
steps will be used, the method will produce an approximation to the solution. 

Example 9.1. The slope field for y' - (t - y)/2 over the rectangle R = {(f, y) : 0 < / < 
5,0 < y < 4} is shown in Figure 9,4. The solution curves with the following initial values 
ttre shown: 

1. For y(0) = 1, the solution is y(f) = 3c f ^ 2 — 2 + t. 

2. For y(0) — 4, the solution is y(t) = 6e~^ 2 — 2 + t. ■ 
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Definition 9.2, Given the rectangle R — {(t, y) : a < t < b, c < y < d), assume 
that /(/, y) is continuous on R . The function / is said to satisfy a Upschitz condition 
in the variable y on R provided that a constant L > 0 exists with the property that 

(8) - f(t, y 2 )\ < £ fyi - yi\ 

whenever (f, y{), (r, y 2 ) € R. The constant L is caiiea a Upschitz constant for /. a 

Theorem 9.1, Suppose that /(r, y) is defined on the region R. If there exists a 
constant L > 0 so that 

(9) \f y {t, y)\ < L for all (t, y) e R , 

then / satisfies a Lipschitz condition in the variable y with Lipschitz constant L over 
the rectangle R. 

Proof. Fix i and use the mean value theorem to get C\ with yq < c\ < y 2 so that 

i/a, y\> - /a. yz)j = i a a, d)o>i - y 2 >i 

- ci)||yi -y 2 J < L\y\ -nV 


Theorem 9.2 (Existence and Uniqueness). Assume that f(t,y) is continuous in a 
region R = ((f, y) : to < t < b,c < y < d}. If / satisfies a Lipschitz condition on R 
in the variable y and (to, yo) € R, then the initial value problem (6), y' = f(t , y) with 
y(to) — yo, has a unique solution y = y(r) on some subinterval fo < t < fo 4- 8. 

Proof See a text on differential equations such as Reference [38]. * 

Let us apply Theorems 9.1 and 9.2 to the function f(t , y) = (t — y)/2. The pai tial 
derivative is f y (t, y) — -1/2. Hence !/ v (r, y)| < \ and, according to Theorem 9.1, 
the Lipschitz constant is L = \. Therefore, by Theorem 9.2 the I.V.P. has a unique 
solution. 

Sketches of the slope field and solution curves can be constructed by using the 
meshgrid and quiver commands in MATLAB. The following M-ftie will generate a 
graph analogous to Figure 9.4. In general, care must be taken to avoid points (/, y) at 
which y' is undefined. 
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[t,y]=meshgrid(l:5,4:-l:1 ); 

dt=ones(5,4); 

dy=(t-y)/2; 

quiver (t ,y, dt, dy); 

hold on 

x=0:,01:5; 

2l=3*exp(-x/2)-2+x; 

z2=6* exp(-x/2)^ 2+x; 

plot(x,zl,x,z2) 

hold off 


Exerci ses for Introduction to Differential Equations 

!'■( Exercises 1 through 5: 


fa) Show that y(t) is the solution to the differential equation by substituting y(/) and 
y'(t) into the differential equation y'(t) — f(t, y(f)). 

ib) Use Theorem 9.1 to find a Lipschitz constant L for the rectangle R = [(/, y) : 0 < 
t <3,0<y <5). 

1. / = t 2 - y, y(r) = Ce+1 1 - Zt + 2 

2, / = 3y + 3 1, y(f) = Ce 3t -t-§ 

3. y l ~ —ty, y(0 = Ce 

4, v' =t e- 21 — 2v, vf£) = Ce~ 2i 4- te“ 2r 


5. / =* 2fy 2 , y(t) = \/(C - t 2 ) 

hi Exercises 6 through 9, construct a graph of the slope field mtj — f(t,-, y j) over ilic 
rectangle R = {(t, y) : 0 < f < 4,0 < y < 4) and the indicated solution curves on the 
same coordinate system. 

6. y' = - t/y , y(t) = (C - f 2 ) I/2 for C = 1, 2,4, 9 

7. y' = tfy, y(r) = (C + t 2 ) y2 for C = -4, -1, 1, 4 

8. / ^ 1/y, y(t) = (C + 2f)^ 2 for C = -4, -2, 0,2 

9. y r = y 2 , y(f) — 1 /(C — f) for C = 1,2, 3,4 

10. Here is an example of an initial value problem that has “two solutions”: y' = i v 1 
with y(0) = 0. 

(a) Verify that y(t) = 0 for t > 0 is a solution, 

(b) Verify that y(f) = t 312 for t > 0 is a solution. 

(c) Does this violate Theorem 9.2? Why? 
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11. Consider the initial value problem 

y = (l-/) 1/2 y(0) = 0 

(a) Verify that yfr) =■ sinfr) is a solution on [0, jr/4]. 

(b) Determine the largest interval over which the solution exists. 

12. Show that the definite integral f{t)dt can be computed by solving the initial vali, ■ 
problem 

y = /(f) for a<t<b with y(a) = 0. 

In Exercises 13 through 15, find the solution to the I.V.P. 

13. y' ~ 3f 2 + sin(f), y(0) =2 
w. y’ = ^y(0) = 0 

15. y’ — y( 0) = 0. Hint. This answer must be expressed as a certain integral. 

16. Consider the first-order differential equation 

y f (t) + p(t)y(t) = q(t). 

Show that the general solution y(f) can be found by using two special integrals. First 
define Fit) as follows: 

Fit) = e f p( - t)dt . 

Second, define y(0 as follows: 

v(0 = ~ U F(t)q(t)dt + c\ . 

Hint. Differentiate the product F(t)y(t). 

17. Consider the decay of a radioactive substance. If y(f) is the amount of substance 
present at time t, then y(t) decreases and experiments have verified that the rate of 
change of y(r) is proportional to the amount of undecayed material. Hence the I.V.P. 
for the decay of a radioactive substance is 

y f - —ky with y(0) - y 0 - 

(a) Show that the solution is y(:) — y^e~ kt . 

(b) The half-life of a radioactive substance is the time required for half of an initial 
amount to decay. The half-life of ,4 C is 5730 years. Find the formula y(t) that 
gives the amount of !4 C present at time t. Hint. Find k so that y(5730) == 0.5yo. 

(c) A piece of wood is analyzed and the amount of l4 C present is 0.712 of the 
amount that was present when the tree was alive. How old is the sample of 
wood? 

(d) At a certain instant, 10 mg of a radioactive substance is present. After 23 sec 
onds, only 1 mg is present. What is the half-life of the substance? 
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In Exercises 18 through 19, derive an equation for the I.V.P. and find its solution. 

18. Annual ticket sales for a new professional soccer league tire projected tci grow at a 
rate proportional to the difference between sales at time t and an upper bound of $300 
million. Assume that annual ticket sales are initially $0 and must be $40 million after 
3 years (or the league folds). Based on these assumptions, how long will it take for 
annual ticket sales to reach $220 million? 

19, The interior volume of a new library is 5 millon cubic feet. The ventilation system 
introduces fresh air into the library at the rate of 45,000 cubic feet per minute. Before 
the ventilation system is turned on, the percents of carbon dioxide in the interior of 
the library and in the exterior fresh air are measured at 0.4% and 0.5%, respectively. 
Determine the percentage of carbon dioxide in the library 2 hours after the ventilation 
system is started. 


9.2 Euler's Method 

The reader should be convinced that not all initial value problems can be solved ex¬ 
plicitly, and often It is impossible to find a formula for the solution y(f); for example, 
there is no “closed-form expression” for the solution to y' — f 3 4- y 2 with y (0) = 0. 
Hence for engineering and scientific purposes it is necessary to have methods for ap¬ 
proximating the solution. If a solution with many significant digits is required, then 
more computing effort and a sophisticated algorithm must be used. 

The first approach is called Euler’s method and serves to illustrate the concepts 
involved in the advanced methods. It has limited usage because of the larger error that 
is accumulated as ithe process proceeds. However, it is important to study because the 
error analysis is easier to understand. 

Let [a, b ] be the interval over which we want to find the solution to the well-posed 
I.V.P. y = f{t, >) with y(< 2 ) = yo- In actuality, we will not find a differentiable 
function that satisfies the I.V.P.. Instead, a set of points {(r,uyi)) generated, and 
the points are used for an approximation (i.e,, y(f*) yt). How can we proceed to 
construct a “set of points” that will “satisfy a differential equation approximately”? 
First we choose the abscissas for the points. For convenience we subdivide the interval 
[a, b] into M equal subintervals and select the mesh points 

b — a 

(1) tk=a + kh for k — 0, 1, ..., M where h — ^ ■ ■ 

The value h is called the step size. We now proceed to solve approximately 


( 2 ) 


y' = fit, y) over (fo. with y(*o) = yo- 
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Assume that y(t), y'{t ), and y"(t) are continuous and use Taylor’s theorem to 
expand y(t) about t = to. For each value t there exists a value cj that lies between to 
arid t so that 

(3) y (0 = y('o) + /0o)0 - to) + L . * Cl)( ^'~ fQ) . 

When /(to) = /(to, y/o)) and h = t\ — t 0 are substituted in equation (3), the 
result is an expression for y (/]): 

h 2 

(4) y(ti) = y(fo) 4- kf(tQ t y(r 0 )) + y*(c\)—. 

If the step size h is chosen small enough, then we may neglect the second-order 
term (involving k 2 ) and get 

(5) vi = vo + hfitn, yn). 
wtiich is Euier*$ approximation. 

The process .is repeated and generates a sequence of points that approximates the 
solution curve y = y(t). The general step for Euler’s method is 

(6) tk+i=tic+h, yt+i = yt +hf(tk,y k ) for k = 0, 1, ..., M - 1. 

Example 9.2. Use Euler’s method to solve approximately the initial value problem: 

(7) y - Ry over [0,1] with y(0) = yo and R constant. 

The step size must be chosen, and then the second formula in (6) can be determined 
for computing the ordinates. This formula is sometimes called a difference equation and 
in l his case it is 

(8) y*+i = y k (l+hR) for * = 0, 1, M- 1. 

If we trace the solution values recursively, we see that 

yi = yod +hR) 

T2 = yi(l +AJ?) = yo(l +hR ) 2 

yM = +hR) =yod+hR) M . 


(9) 
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Table 9.1 Compound Interest in Example 9,3 



For most problems there is no explicit formula for determining the solution points, and 
each new point must be computed successively from the previous point. However, for the 
initial value problem (7) we are fortunate; Euler’s method has the explicit solution 

(iO) f* = kh y k = yo{l- hR) k for k = 0, 1, ..., M. 

Formula (10) can be viewed as the “compound interest” formula, and the Euler ap¬ 
proximation gives the future value of a deposit. u 

Example 9.3. Suppose that $1000 is deposited and earns 10% interest compounded con¬ 
tinuously over 5 years. What is the value at the end of 5 years? 

We choose to use Euler approximations with h = 1 ,1 and to approximate v(5) 
for the I.V.P.: 12 360 y ’ 


[0, 5] with y(0) = 1003. 


Formula (30) with £ = 0.1 produces Table 91. 


Think about the different values yj, y 6 q, and vtsix) that are; used to determine the 
future value after 5 years. These values are obtained using different step sizes and 
reflect different amounts of computing effort to obtain an approximation to y (5) The 
solution to the I.V.R is y( 5) = 1000* 0 - 5 = 1648.72. If we did not use the closed-form 
solution (10), then it would have required 1800 iterations of Euler’s method to obtain 
yiBQO, and we still have only five digits of accuracy in the answer! 

t If bankers had to approximate the solution to the I.V.P. (7), they would choose Eu¬ 
lers method because of the explicit formula in (10), The more sophisticated methods 
fol- approximating solutions do not have an explicit formula for finding y k> but they 
will require less computing effort. 


geometric Description 

If you start at the paint (to, vo) and compute the value of the slope mo — f(t 0> y 0 ) 
and move horizontally the amount h and vertically A/(r 0l y 0 ), then you are moving 
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1.5 

1.0 

{ Figure 9.5 Euler’s approximations 
0.0 0.5 1.0 1,5 2.0 2.5 3.0 yi-i = y^ + A/( 4 , yk)- 

along the tangent line to y(t) and will end up at the point (fj, yi) (see Figure 9.5). 
Notice that (ft, yO is not on the desired solution curve! But this is the approximation 
that we are generating. Hence we must use (/j, yi) as though it were correct and 
proceed by computing the slope m i — and using it to obtain the next vertical 

displacement y >) to Locate (> 2 , yi), and so on. 




Step Size versus Error 


The methods we introduce for approximating the solution of an initial value problei n 
are called difference methods or discrete variable methods. The solution is approx¬ 
imated at a set of discrete points called a grid (or mesh) of points. An elementary 
single-step method has the form y*+i = y* 4 - A<J>( 4 , yk) for some function 4> called 
an increment Junction. 


When using any discrete variable method to approximately solve an initial value 


problem, there are two sources of error: discretization and round off. 


Definition 9.3 (Discretization Error). Assume that {( 4 , is the set of dis¬ 

crete approximations and that y = y{t) is the unique solution to the initial value probr 
lem. 

The global discretization error e k is defined by 


UD 


ek = y( 4 ) - yk for a: = 0 , 1 . M. 


It is the difference between the unique solution and the solution obtained by the discrete 
variable method. 

The local discretization error f k +i is defined by 


( 12 ) e*+i = y(4+i) - yk -*<*>(4, yk) for k = 0, 1, .... M — 1. 


It is the error committed in the single step from 4 to 4+1 . 
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When we obtained equation ( 6 ) for Euler’s method, the neglected term for each 
step was y c ) {c k ){h 1 / 2). If this was the only error at each step, then at the end of the 
interval [a, b], after M steps have been made, the accumulated error would be 


M , 

£/ 2 ’(q)- 

k=\ 


' My {2) (c )y = — y i2 \c)h = 


(b - a)y i2) (c ) 


h = 0(h l ). 


Theic uuuld be more error, but this estimate predominates. A detailed discussion on 
this topic can be found in advanced texts on numerical methods for differential equa¬ 
tions (Reference [75]). 


Theorem 9.3 (Precision of Euler's Method). Assume that y(t) is the solution to 
the I.V.P. given in (2). If y(f) e C 2 [t$, b) and {( 4 , is the sequence of approxi¬ 

mations generated by Euler’s method, then 

k*1 = Ij'(ft) - y*l = 0(H), 

\€k+i\ = |y(4+i) -yk- hf(t k , yt ) | = 0 (h 2 ). 

The error at the end of the interval is called the final global error ( F.G.E. ): 

( 14 > £ (y(b),h) = \y{b) - y M \ = 0(A). 

Remark. The final global error E(y(b), h ) is used to study the behavior of the error for 
various step sizes. It can be used to give us an idea of how much computing effort must 
be done to obtain an accurate approximation. 

Examples 9.4 and 9.5 illustrate the concepts in Theorem 9.3. If approximations are 
computed using the step sizes h and A/2, we should have 

05) E(y(b)>h)KCh 

for the larger step size, and 

(16) E {y(b). j) « Cj = '-Ch « l£(y(i). A). 

Hence the idea in Theorem 9.3 is that if the step size in Euler’s method is reduced by a 
factor of \ we can expect that the overall F.G.E. will be reduced by a factor of 

Example 9.4. Use Euler’s method to solve the I.V.F. 

/ t — y 

y = —z— on [0, 3] with y{0) = 1. 

2 - \ 


Compare solutions for h = 1 , i, and J 
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y 



00 0.5 1.0 1.5 2.0 2.5 3.0 

Figure 9.6 Comparison of Euler solutions with different 
step sizes for / =; (r - y)/2 over [0, 3] with the initial 
condition y(0} = 1- 


Figure 9.6 shows graphs of the four Euler solutions and the exact solution curve y{t } — 
3e 1 — 2 -)- 1 . Table 9.2 gives the values for the four solutions at selected abscissas. For 
the step size h = 0.25 t the calculations are 

>i = 1 -0 4- 0.25 ( °'° ~ 10 ^ = 0.875, 

yi = 0.875 + 0.25 ( ° 25 ~ 0 875 ) = 0.796875, ac. 

This iteration continues until we arrive at the last step: 


■■y [2 = 1.440573 + 


0.25 


75 - 1.440573 

2 


== 1.604252. 


Example 9.5. Compare the F.G.E. when Euler’s method is used to solve the I.V.P. 

, t — y 

y = —-— over [0, 3] with y(0) = l f 

using step sizes 1, j, ..., 

Table 9.3 gives the F.G.E, for several step sizes .and shows that the error in the approx 
imation to v(3) decreases by about \ when the step size is reduced by a factor of For 
the smaller step sizes the conclusion of Theorem 9.3 is easy to see: 


£(y(3), h) = y(3) - y M = Oih 1 ) ^ Ck, where C - 0.256. 
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Table 9.2 Comparison of Euler Solutions with Different Step Sizes for y' = (t — y)/2 
over [0,3] with y( 0) — l 



0.125 0.9375 0.943239 

0.25 0.875 0.886719 0-897491 

0.375 0.846924 0.862087 

0.50 0.75 0.796875 0.817429 0.836402 

0.75 0.759766 0.786802 0.811868 

1.00 0.5 0.6875 0.758545 0.790158 0.819592 

1.50 0.765625 0.846386 0.882855 0.917100 

2.00 0.75 0.949219 1.030827 1.068222 U03638 

2.50 1.211914 1.289227 1.325176 1.359514 

3.00 1.375 1.533936 1.604252 1.637429 1.669390 



A 


192 


1.665459 


0.003931 


0.004 
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fTogram 9,1 (Euler’s Method). To approximate the solution of the initial value 
problem / = fit, 7 ) with y(a) = yo over [a, b] by computing 

Vk +1 = y* + hfitk , yt) fo r k=0, 1, ... T M- 1. _ 

function E*euler(f,a,b,ya,M) 

’/.Input - f is the function entered as a string ’f’ 
y t - a sind b are the left and right end points 

% - ya is the initial condition vCa) 

a /„ - M is the number of steps 

’/.Output - EH;!' Y j ] where T is the vector of abscissas and 
’/, Y is the vector of ordinates 

h*(b-a)/M; 

T= : zeros(l ,M+1); 

Y a! zeros(l ,M+1); 

T®a:h:b; 

Y(l)=ya; 
for i=l:M 

Y(j+l)=Y(j)+h*feval(f ,T(j) ,Y(j)); 

end 

E®[T J Y J ] ; 

Exercises for Euler’s Method_. — _ 


n Exercises 1 through 5 solve the differential equations by the Euler method. 

(a) Let h = 0 2 and do two steps by hand calculation. Then let h — 0.1 and do four 
steps by hand calculation. 

(b) Compare the exact solution >'(0.4) with the two approximations in part (a). 

(c) Does the F.G.E. in part (a) behave as expected when h is halved? 

1. y r = t 2 - y with y(0) = 1, y(r) = — e~* + t 1 — 2t + 2 

2. / = 3y + 3r with y(0) = 1, y(f) = - f - \ 

3. y = -ty with y(0) = 1, y(f) = e" 1 ^ 2 

4. y = e~ 2i — 2y with y(0) = y(t) = + te~ 2t 

5. / = 2 ty 2 withy(O) = 1, y(f) = 1/(1 - / 2 ) 

6. Logistic population growth. The population curve P(t) for the United States 
assumed to obey the differential equation for a logistic curve P' = a P — b P 2 . L,ct t 
denote the year past 1900, and let the step size be h = 10. The values a = 0.01 ANu 
b = 0.00004 produce a model for the population. Using hand calculations, ftNb the 
Euler approximations to P(t) and fill in die following table. Round off each value 

to the nearest tenth. 
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7. Show that when Euler's method is used to solve the I. V.R 

/ = /(f) over [a, b] with y(«) = y 0 = 0 

the result is 

m -i 

y(b) S3 £ f(tk)h, 

*=o 

which is a Riemann sum that approximates the definite integral of f(t) taken over the 
interval [a, b}. 

8 Show that Euler’s method fails to approximate the solution y(t ) = r 3/2 of the I. V.P. 

/ = fit , y) = 1 .Sy - 1/3 with >-(0) = 0. 

Justify your answer. What difficulties were encountered? 

9, Can Euler’s method be used to solve the L V.P. 

y' = 1 + y 2 over [0, 3] with y(0) == 0? 

Hint. The exact solution curve is y(t) = tan(,r). 


Algorithms and Programs 


In Problems 1 through 5, solve the differential equations by the Euler method. 

(A) L«t h — 0.1 and do 20 steps with Program 9.1. Then let h = 0.05 and do 40 steps 
with Program 9.1. 

[bl Gompare the exact solution y(2) with the two approximations in part (a). 

(ODoes the F.G.E. in part (a) behave as expected when h is halved? 
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(d) Plot the two approximations and the exact solution on the same coordinate system. 
Hint. The output matrix E from Program 9.1 contains the x and y coordinates of 
the approximations. The command plot (E( ;,1),E(: } 2)) will produce a graph 
analogous to Figure 9.6. 

1* y' = t 2 — y with y(0) = 1, y(f) — — e~* Hr t 2 — 2t + 2 

2. y' - Sy + 3 1 with y(0) = l, y(f) = Je 3 ' - f - 3 

3 . y - -ry with y( 0 ) = 1 , y(t) — 

4. y f = e~ 2i - 2y withy(O) == ^5, y(r) = i^ -21 4- te~ 2 ' 

5. / = 2 ry 2 with y(0) = 1, y (r) — 1/(1 - f 2 ) 

6 . Consider y ! = 0.12y over [0, 5] with y (0) = 1000. 

(a) Apply formula (10) to find Euler’s approximation to y(5) using the step sizes 
h = 1 , ^ and 3 ^. 

(b) What is the limit in part (a) when k goes to zero? 

7. Exponential population growth. The population of a certain species grows at a rate 
that is proportional to the current population and obeys the I VP 

/ = Q,G2y over [0,5] with y(0) = 5000. 

(a) Apply formula (10) to find Euler’s approximation to y(5) using the step sizes 
h ~ 1- n> and 550- 

(b) What is the limit in part (a) when h goes to zero? 

8 . A skydiver jumps from a plane, and up to the moment he opens the parachute the 
air resistance is proportional to u 3/2 (v represents velocity). Assume that the tiw« 
interval is [ 0 , 6 ] and that the differential equation for the downward direction is 

u' = 32 - 0.032 v 3/2 over [0, 6] with u(0) = 0. 

Use Euler’s method with h — 0,05 and estimate u( 6 ). 

9. Epidemic model. The mathematical model for epidemics is described as follow* 
Assume that there is a community of L members that contains P infected individual* 
and Q uninfected individuals. Let y(t) denote the number of infected individuals oi 
time t. For a mild illness, such as the common cold, everyone continues to be active 
and the epidemic spreads from those who are infected to those uninfected. SifKt 
there are PQ possible contacts between these two groups, the rate of change of y(t) 
is proportional to PQ. Hence the problem can be stated as the I.V.P. 

y = ky(L — y) with y( 0 ) == yo. 

(a) Use L - 25,000, k = 0.00003, and h = 0,2 with 'the initial condition y{0) r 
250, and use Program 9.1 to compute Euler’s approximate solution over [0,60}, 

(b) Plot the graph of the approximate solution from part (a). 

(c) Estimate the average number of individuals infected by finding the average of 
the ordinates from Euler’s method in part (a). 


(d) Estimate the average number of individuals infected by fitting a curve to the data 
from put (a) and using Theorem 1.10 (Mean Value Theorem for Integrals). 

10, Consider the first-order integro-ordinary differential equation 

y' = l-3y - 0.25y 2 - 0,0001y jf y(r)dr. 

(a) Use Euler’s method with h = 0,2, and y (0) = 250 over the interval [0, 20]. and 
the trapezoidal rule to find an approximate solution to the equation. Hint. The 
genera! step for Euler’s method (6) is 

M+l = yk + h(l3yk — 0.25yj? — O.OOOly* f y(j)dr). 

Jo 

If the trapezoidal rule is used to approximate the integral, then this expression 
becomes 

yk+\ - yk + A(l.3y* - 0.25y* 2 - 0.0001y*?*(A)), 
where ?b(h) = 0 and 

T kih) = 7i_] (/i) + 1 H- y*) for k ^ 0, 1, ..., 99. 

(b) Repeat part (a) using the initial values y(0) = 200 and y (0) = 300. 

(c) Plot the approximate solutions from parts (a) and (b) on the same coordinate 
system. 


3.5 Heun’s Method 

The next approach, Heun’s method, introduces a new idea for constructing; an algo¬ 
rithm to solve the I.V.P. 

0) /(0 = /(*,y(0) over [a,b] with y(f 0 ) = yo* 

To obtain the solution point (0, yi), we can use the fundamental theorem of calculus 
and integrate y'(t) over [*o, 0] to get 

W / f y'(t)dt = y(t])-y(t 0 ), 

J )0 

Where the antiderivative of y\t) is the desired function y(r). When equation (2) is 
Solved for y(*j), the result is 

^ y(?l) = y(fo) + f f(t, y (0) dt. 

Jto 
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Now a numerical integration method can be used to approximate the definite inte¬ 
gral in (3). If the trapezoidal rule is used with the step size h = t\ — to, then the result 
is 

(4) y(ri) y(zo) + |(/(zo, ^(ri))> + f{t\, y(ri))>. 

Notice that the formula on the right-hand side of (4) involves the yet to be deter 
mined value y(fi). To proceed, we use an estimate for v(?i). Euler’s solution will 
suffice for this purpose. After it is substituted into (4), the resulting formula for findi n j 
(t\ , vi) is called Heun’s method : 

(5) yi = y (to) + ^(/(to, yo) + f(t\, yo + hf(to, yo))). 

The process is repeated and generates a sequence of points that approximates the 
solution curve y = y(t). At each step, Euler’s method is used as a prediction, and then 
the trapezoidal rule is used to make a correction to obtain the final value. The general 
step for Heun’s method is 


ytc +1 = yk + , y k ) + /(r*+i, 


Notice the role played by differentiation and integration in Heun’s method. Dra-\ 
the line tangent to die solution curve y = y (r) at the point (r 0f yo) and use it to find the 
predicted point (ft, pi). Now look at the graph z — f(t, y(t )) and consider the points 
(to, /o) and (ti, /i), where fo = /{fo, yo) and f\ = f{t\, p\). The area of title trape¬ 
zoid with vertices (?o, fo) and (fj, f\) is an approximation to the integral in (3), which 
is used to obtain the final value in equation (5). The graphs are shown in Figure 9.7. 


Step Size versus Error 

The error term for the trapezoidal rule used to approximate the integral in (3) is 

(7) 

If the only error at each step is that given in (7), after M steps the accumulated error 
for Heun’s method would be 

(8) -£j ,<2) (^>-rr ^ -rz-y {1) (c)h 2 = 0(h 2 ). 

k=i lL 

Title next theorem is important, because it states the relationship between F.G.E, 
and step size. It is used to give us tun idea of how much computing effort must be done 
to obtain an accurate approximation using Heun’s method. 
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y 



(a) Derivative predictor: (b ) Integral corrector; 

*«* +*/<*„.*) y 1 -y 0 = f(/ 0 + /i) 


Figure 9.7 The graphs y = y{t) and z - f(t, y (r)) in the derivation of Heun’s method. 

Theorem 9.4 (Precision of He inn’s Method).. Assume that >■{/) is the solution to 
the I.V.P. (1). If y(t) e C 3 [fo, b J and {(f*, y*)}£/ 0 is the sequence of approximations 
generated by Heun’s method, then 

(9) \ek\ = \y(t k )- yii \ = 0(h z ), 

!er+i! = !y(/i.,.i) -y k - y k )\ = 0{A 3 ), 

where <h(f*, y k ) = y* + (A/2) (/(/*, y k ) + y k + hf(t k , y t »). 

In particular, tlie final global error (F.G.E.) at the end of the interval will satisfy 

^0) E(y(b) % h) = \y(b) — yf 4 | = 0(h 2 ), 

Examples 9.6 and 9.7 illustrate Theorem 9.4. If approximations are computed 
using the step sizes h and A/2, we should have 

O 1 ) E{y{b),h)^Ch 2 

for tlie larger step size, and 

(12) E (j(b), j'j « C^- = ~Ch 2 « l -E(y(b), h). 

Hence the idea in Theorem 9.4 is that if the step size in Heun’s method is reduced by a 
factor of \ we can expect that the overall F.G.E. will be reduced by a factor of 
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0.0 0.5 l.o 1,5 2,0 


Figure 9.8 Comparison of Heun solutions wilh different 
step sizes for y ! = (t - y)/2 over [0, 2] with the initial 
condition y(0) == 1, 


y = —on [0, 3] with y(0) = 1. 

Compare solutions for/t = 1, i, i, and g. 

Figure 9.8 shows the graphs of the first two Heun solutions and the exact solution cu 
T(0 - 3e~ l - — 2+t. Table 9.4 gives the values for the four solutions at selected absciv 
For the step size h = 0.25, a sample calculation is 

/ (to, yo) = = — 0.5 

Pi = FQ + 0-25(—0.5) == 0.875, 

,, , 0.25 - 0,875 

f(h. Pi) = ---= -0.3125, 

yi — 1-0 + 0.125{—0,5 - 0.3125) = 0.8984375. 

This iteration continues until we arrive at the last step: 

y<3) % y\2 = 1.511508 + 0.125(0.619246 + 0.666840) = 1.672269. 
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S* Jhrii ^ mp J rison of Heun Solutions with Different Step Sizes for y' = (t - y)/2 over 



Table 9S Relation betW€*n Step Size and F.G.E. for Heun Solutions to 
y =(?- ))/2 over [0,3] with y(0) = 1 


F.G.E. Q(h 2 ) as Ch 2 


Step 
size, h 

Number of 
steps, M 

Approximation 
to y(3), yM 

Error at / -3, 

yO) - yM 

where 

C = -0.0432 

l 

3 

1.732422 

-0.063032 

-0.043200 

i 

6 

1.682121 

-0.012731 

-0.010800 

i 

j 

12 

1.672269 

-0.002879 

-0.002700 

i 

5 

24 

1.670076 

-0.000686 

-0,000675 

l 

TS 

48 

1.669558 

-0.000168 

-0.000169 

1 

33 

96 

1.669432 

-0.000042 

-0.000042 

1 

5? 

[ 192 

L 

1.669401 

-0.000011 

-0.000011 


Table 9.5 gives the F.G.E. and shows that the error in the approximation to v(3) de¬ 
creases by about when the step size is reduced by a factor of \ : 

E(y(3), h) = y( 3) - y M = Q{k 2 ) * Ch 2 , 


where C = -0.0432. 
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Program 9,2 (Heim’s Method). To approximate the solution of the initial value ; 
problem / = /(r, y) with y(a) - yo over [a, b] by computing 

y*-H = yk + -(/(&, yk) + /(fjt+i, yk + f(tk. y*)» 

for* = 0, 1 ,.,M — 1. 
function H=heun(f .a.b.ya.M} 

V;Input - f la the function entered as a string ’£* 

% - a and b axe the left and right end points 

% - ya is the initial condition y(a) 

% - M is the number of steps 

'/.Output - H= [T 5 Y'} where T is the vector of abscissas and 
% Y is the vector of ordinates 

h*(b-a)/M; 

T=5ieros(l.M+i) ; 

Y=3:eros(l,M+l) ; 

T=a:h:b; 

Y(l)=ya; 
for j=l:H 

kl-feval(f,T(j),Y(j))j 
k2=feval(f,T(j+l),Y(j)+h*kl); 

Y(j+l)=Y(j) + (h/2)*(kl+k2); 

end 

H=CT 5 Y J ] ; 


In Eixercises t through 5 solve the differential equations by Heun’s method. 

(a) Let h = 0.2 and do two steps by hand calculation. Then let /j =0.1 and do four 
steps by hand calculation. 

(b) Compare the exact solution y (0.4) with the two approximations in part (a). 

(c) Does the F.G.E. in part (a) behave as expected when h is halved? 

1 . y' = t 2 -y with y( 0 ) = 1 , V(f) = -e~ l + t 2 - 2t + 2 

2. y f = 3y + 3r with y(0) = 1, y(f) = %e 3( - t - 3 

3. y' = —ty with y(0) = 1, y(r) = 

4 . v' = e~ 2r - 2 v with y( 0 ) = fa y(t) = -^e -2 ' + te~ 2t 


5 . y' = Ity 1 with y( 0 ) = 1 , y (0 = 1/(1 - t 2 ) 

Notice that Heun’s method will generate an approximation to y(l) even though the 
solution curve is not defined att = 1 . 

6 . Show that when Heun’s method is used to solve the I.V.P. y' = / (f) over [a, b] with 
y(a) = yn = 0 the result is 

h ,w_i 

yib) — - ^(/(ffc) + / (te+O), 

" Jt=0 

which is trie trapezoidal rule approximation for the definite integral of /(f) taken over 
the interval [a, b\. 

7. The Richardson improvement method discussed in Lemma 7.1 (Section 7.3) can be 
used in conjunction with Heun’s method If Heun’s method is used with step size h, 
then we have 

y(b) tvyk+ Ck 2 , 


If Heun’s method is used with step size 2h, we have 

yib) =» V 2 A + 4C7i 2 . 

The terms involving Ch 2 can be eliminated to obtain an improved approximation for 
y(f>), and the result is 


y{b) ■ 


4yft - yih 


The impro vement scheme can be used with the values in Example 9.7 to obtain better 
approximations to y(3). Find the missing entries in the table below. 


h 

yk 

(4w, -yih)/ 3 

1 

1.732422 


1/2 

1.682121 : 

1.665354 

1/4 

1.672269 


1/8 

1.670076 


1/16 

1,669558 

1.669385 

1/32 

1.669432 


1/64 

1.669401 



8 . Show that Heun’s method fails to approximate the solution y(r) = r 3/2 of the I.V.P 
y — fit, y) = 1.5y l/3 with y(0) = 0. 

Justify your answer. What difficulties were encountered? 
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Algorithms and Programs _ 

In Problems 1 through 5 solve the differential equations by Heuo’s method. 

(a) Let k — 0.1 and do 20 steps with Program 9.2. Then let h — 0.05 and do 40 steps 
with Prograhi 9.2. 

(b) Compare the exact solution y( 2) with the two approximations in part (a). 

(a) Does the F.G.E. in part (a) behave as expected when h is halved? 

(a) Plot the two approximations and the exact solution on the same coordinate system. 
Hint. The output matrix K from Program 9.2 contains the x and y coordinates of 
the approximations. The command plot (H(:, 1) ,H{ r ,2)) will produce a graph 
analogous to Figure 9.8. 

1. y f = t 2 - y with y(0) - 1, y(r) = -e~ f +1 2 - 2r + 2 

2. / = 3y + 3r with y(0) = I, yd) = | e 3t - t - ~ 

3> y' = — !y with v(0) = 1, y(t) = e ~‘ Zjf2 

4. y' - e~ 2t - 2y with y(0) = y(t) = + te~ 2t 

5. y' — 2ty 2 with y(0) = L y(f) = 1/(1 - f 2 ) 

6. Consider a projectile that is fired straight up and falls straight down. If air resi'UaiiLC 
is proportional to the velocity, the I.V.P. for the velocity v(t) is 

f' 

v ! — -32 - —v with o(0) = oo, 

M 

where un is the initial velocity, M is the mass, and K the coefficient of air resistance. 
Suppose that uo = 160 ft/sec and KjM = 0.1. Use Heun’s method with h = 0.5 to 
solve 

v = -32 - O.lv over [0,30) with r(0) = 160. 

Graph your computer solution and the exact solution v{t) — 480c ~ f/10 — 320 on the 
same coordinate system. Observe that the limiting velocity is —320 ft/sec. 

7. In psychology, the Wever-Fechner law for stimulus-response states that the rate of 
change dR/dS of the reaction R is inversely proportional to the stimulus. The thresh 
old value is the lowest ievei of the stimulus that can be consistently detected. The 
I.V.P. for this model is 

k 

R' — - with R(Sq) = 0. 

5 

Suppose that So = 0.1 and /?(0.1) = 0. Use Heun’s method with h = 0,1 to solve 
R‘ = ^ over [0.1, 5-11 with /?((U) = 0. 

•J 
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8. (a) Write a program to implement the Richardson improvement method discussed 
in Exercise 7. 

(b) Use your program to approximate y (2) for each of the differential equations in 
Problems 1-5 over the interval [0, 2]. Use the initial step size h = 0.05. The 
program should terminate when the absolute value of the difference between 
two consecutive Richardson improvements is < 10~ 6 . 


3.4 Taylor Series Method 

The Taylor series method is of general applicability, and it is the standard to which we 
compare the accuracy of the various other numerical methods for solving an I.V.P. It 
can be devised to have any specified degree of accuracy. We start by reformulating 
Taylor’s theorem in a form that is suitable for solving differential equations. 


! that y(t) e C Ar+1 fo, b) and that y(t) 


has a Taylor series expansion of order N about the fixed value t — 4 € [to. b]\ 
( 1 ) y(tk + h) — y(tjc) + hT N (t k , y(t k )) + 0(h N+l ), 




and y U) (t) = f { j l) {t, y(t)) denotes the (j - l)st total derivative of the function / 
with respect to t. The formulas for the derivatives can be computed recursively: 

y'it) = / 

/<*) = /,+ /,/ = /, + /,/ 

/ 3> (0 = fu + 2 f, y y' + /,/ + /„(/) ! 

= f,, + Iftyf + fyyf 1 + fy(f, + fyf) 

= fm + 3 /„,/ + Iftyyty 1 ) 2 + 

+ f,y‘” + if„y'y‘ + 

= (/ttf + 3/„y / + 3/,yy/ Z + /yyy/ 3 ) + /,(/,( +2/|y/ + /yy/ 2 ) 


and, in general. 


+ 3 (/, + /y/)(/,y + fyyf) + fhf, + fyf) 


><*>(») = y(f)). 



452 Chap. 9 Solution of Differential Equations 


where P is the derivative operator 



The approximate numerical solution to tile I.V.P. /(?} = /(f, y) over [f 0 . t M ] 
is derived by using formula (1) on each subinterval [fe, /*_(_j ]. The general step for 
Taylor's method of order N is 


(5) 


yk+\ = yk + dih + 


d 2 h 2 

2! 



d N h N 
Nl ’ 


where dj = y (j) (tk) for y = l T 2,..., JV at each step k = 0, 1,.,., M - 1. 

The Taylor method of order N has the property that the final global error (EG.E.) 
is of the order 0(h N+l )\ hence N can be chosen as large as necessary to make this 
error as small as desired. If the order N is fixed, it is theoretically possible to a priori 
determine the step size h so that the F.G.E. will be as small as desired. However, in 
practice we usually compute two sets of approximations usin g step sizes h and h /2 and 
compare the results. 


Theorem 9.6 (Precision of Taylor's Method of Order N), Assume that y(t) is 
the solution to the I.V.P.. If y(t) e C N + x [tQ,b\ and {(4, y.O}£L 0 is the sequence of 
approximations generated by Taylor’s method of order i V, then 

ktI = ly(4) - y*l = 0(h N+[ ), 

(a) 

k*+i I = !v(ft*i) — yt — hTuin . vt>l = 0(h N \. 


O) E(.y(b),h) = ly(b>-y M \ = 0(k /, ). 

The proof can be found in Reference [78], 

Examples 9.8 and 9.9 illustrate Theorem 9,6 for the case N — 4. If approximate 
are computed using the step sizes h and h/2, we should have 


E(y(h), h) % Ch 4 


for the larger step size, and 


E y(t>). 


c r 6 = r 6 Ch4xi T6 E ^ w - h) - 


Hence the idea in Theorem 9.6 is that if the step size in the Taylor method of order 4 is 
reduced by a factor of \ the overall F,G.E. will be reduced by about -it. 
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Example 9.8. Use the Taylor method of order N = 4 to liolve y' = (/ — y)/2 on [0, 3] 
with y(0) = 1. Compare solutions forh = 1 ;1 and £. 

The derivatives of y{t) must first be determined. Recall that the solution y(r) is a 
function of f, and differentiate the formula y\t) = /(/, y(r)) with respect to / to get 
y* 2) (f), Then continue the process to obtain the higher derivatives. 

/a>==^, 

y K} " dt \ 2 ) ' 2 2 4 




-l + (r-y)/2 ^ -2 + t-y 
4 8 

= 1 - (/ - y)/2 _ 2 -t + y 
8 16 


To find y i, the derivatives given above must be evaluated at the point Ob, yo) = (0, 1) 
Calculation rev eals that 

d} = /to, = 0,0 ~ L9 = _o.5, 

<h = >®(0) = 2Q -° 4 °+ 1 - 0 = 0.75. 

d 3 = yO >( 0) = - 2 -°+°- 0 - 10 = _ 0 . 375> 

dt = y< 4 ><0> = Z °-°f+ 1 0 = 0.1875. 

.16 


is used to compute the value yj: 

yi = 1.0 + 0.25 (-0.5 + 0.25 (^ + 0.25 + 0,25 (^^)))) 

= 0.8974915. 

The computed solution point is (/], yj) = (0.25,0,8974915). 

To determine y 2f the derivatives [dj] must now be evaluated at the point (t\ , yf) = 
(0.25,0.8974915). The calculations are starting to require a considerable amount of com¬ 
putational effort and are tedious to do by hand. Calculation reveals that 

.,=y ( 0.25)= ^- Q 2 8974 ^ = -0.3237458, 


..(2),n^_2.0 - 0.25 + 0.8974915 


di = y 1J (0.25) == 


= 0.6618729, 


m -2,0 + 0.25 — 0,8974915 

d 2 = y< 3) (0.25) = ^ ^ - == -0.3309364, 

8 

,,, 2.0 - 0.25 + 0.8974915 

d 4 = y (4) (0.25) ==- - -= 0.1654682. 
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Table 9.6 Comparison of the Taylor Solutions of Order N = 4 for y' = (/ - y) /2 
over [0,3J with y(0) = 1 


*k 

y* 

yftt) Exact 

h = 1 

»-* 

h = j 

* = S 

0 

1.0 

1.0 

1.0 

1,0 

1.0 

0.125 




0.9432392 

0.9432392 

0.25 



0.8974915 

0.8974908 

0.8974917 

0.375 


i 


0.8620874 

0.8620874 

0.50 


0.8364258 , 

0 8364037 

0.8364024 

0.8364023 

0.75 



0.8118696 

0,8118679 

0.8118678 

1.00 

0.8203125 

0.8196285 

0.8195940 

0.8195921 

0.8195920 

1.50 


0.917 J423 

0-9171021 

0.9170998 

0.9170997 

2.00 

1.1045125 

1.1036826 

1.1036408 

1.1036385 

! 1.1036383 

2.50 


1.3595575 

1.3595168 

1.3595145 

| 1.3595144 

3.00 

1.6701860 

1.6694308 

1.6693928 

1.6693906 

1.6693905 


Now these derivatives [dj] are substituted into (5) with h = 0.25, and nested multiplication 
is used to compute the value yi’. 


yi = 0.8974915 + 0.25^-0.3237458 

_ /0.6618729 /-0. 

+ 0.25 f ---+ 0.25 f - 


-0.3309364 


■ + 


-(“)))) 


= 0.8364037. 


The solution point is (tj, yi) = (0.50,0.8364037). Table 9.6 gives solution values at 
selected abscissas using various step sizes. ■ 


Example 9.9. Compare the F.G.E. for the Taylor solutions to / = (r - y)/2 over [0. 31 
with y(0) = 1 given in Example 9.8. 

Table 9.7 gives the F.G.E. for these step sizes and shows that the error in the approxi 
mation y (3) decreases by about jg when the step size is reduced by a factor of \\ 

£(y(3), h) = y(3) - y M = 0(h 4 ) as C7t 4 , where C = -0.000614. a 

The following program requires that the derivatives y\ y", y”\ and y m> be saved 
in an M-file named df. For example, the following M-file would save the derivatives 
from Example 9.8 in the format required by Program 9.3, 

function z=df(t,y) 

z=[(t-y)/2 (2-t+y)/4 <-2+t-y)/8 (2-t+y)/163; 


Table 9.7 Relation between Step Size and F.G.E. for the Taylor Solutions to 
y' = (t - y)/2 over [0,3] 


Step 
size, h 

Number of 
steps, M 

Approxi mad cm 

y(3), yM 

F.G.E. 

Error at f = 3, 

y(3) - ya 

0(h 2 ) as Ch 4 
where 

C = —0.000614 

1 

3 

1.6701860 

-0,0007955 

-0.0006140 

1 

2 

6 

1.6694308 

-0.0000403 

-0,0000384 

1 

4 

12 

J.6693928 

-0.0000023 

-0.0000024 

i 

24 

1,6693906 

-0.0000001 

-0.0000001 


r »rograin 9.3 (Taylor’s Method of Order 4). To approximate the solution of the 
i initial value problem / = fit, y) with y(a) - yo over [a, b]by evaluating y", y", 
| and y >m and using the Taylor polynomial at each step, 

function T4=taylor(df,a,b,ya,M) 

7Jnput - df*[y' y'* y*'» entered as a string ’df’ 

% where y’=f(t,y) 

% - a and b are the left and right end points 

% - ya is the initial condition y(a) 

•/. - M is the number of steps 

'/■Output - T4=[T , Y’j where T is the vector of abscissas and 

% Y is the vector of ordinates 

h=(b-a)/M: 

T^zerosCl.M+l); 

Y*zeros(l,M+l); 

T*a:h:b; 

Y(l)=ya; 
for j=l:M 

D=feval(df,T(j); 

Y(j+1)=Y(j)+h* CD(1)+h* <D(2)/2+h*(D(3)/6+h*D (4) /24))); 

end 

T4-[T’ Y J ]; 


Exercises for Taylor Series Method 


In Exercises 1 through 5 solve the differential equations by Taylor's method of order A' — 4, 
(a) Let h = 0.2 and do four steps by hand calculation. Then let h — 0.2 and do two 
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steps by hand calculation. 

(b) Compare the exact solution >(0.4) with the two approximations in part (a) 

(<:) Does the F.G.E. in part (a) behave as expected when h is halved? 

1. y ' = r 2 -y With y(0) = 1, y(0 = -e ! + f 2 - 2f + 2 

2. >' = 3y + 3r with >(0) = 1, y(t) = $e 3! - t - \ 

3. y' = -ty wilh >(0) = 1, y(f) = <r ,2/2 

4. / = e -2 ' - 2 y with >(0) = ^ >(/) = XQC~ 2t + te~ 2i 

5. / = 2ty 2 with >(0) = 1, v(f) = 1/(1 - 1 2 ) 

6. The Richardson improvement method discussed in Lemma 7.1 (Section 7.3) can be 
used in conjunction with Taylor’s method. If Taylor’s method of order N = 4 is 
used with step size h, then y(b) =* y h + Ch 4 . If Taylor’s method of order N = 4 is 
used with step size 2h, then y{b) as y 2h + 16 Ch 4 . The terms involving Ch 4 can be 
eliminated to obtain an improved approximation for > {by. 


y(b) 


16 yh - > 2 h 

15 


This improvement scheme can be used with the values in Example 9.9 to obtain better 
approximations to y(3). Find the missing entries in the table below. 


h 

yk 

(I6>a - mV** 

l.O 

1.6701860 


0.5 

1.6694308 


0.25 

1.6693928 


0.125 

1.6693906 



7. Show that when Taylor’s method of order N is used with step sizes h and h/2, heu 
the overall F.G.E. will be reduced by a factor of about 2~ N for the smaller step 

8, Show that Taylor’s method fails to approximate the solution y(r) = r 3/2 of the 1 .V P. 
y' ~ f{t, y ) = 1.5> !/3 with y(0) = 0. Justify your answer. What difficulties werT 
encountered? 


9, (a) Verify that the solution to the I.V.P. y' — y 2 , y(0) — 1 over the interval [0, 1) is 

y(t) == 1 /( 1 - 0 - 

(b) Verify that the solution to the I.V.P. >' = 1 + y 2 , >(0) = 1 over the interval 
[0, ?r/4) is y(f) = tan(t + Jt/4). 

(c) Use tile results of parts (a) and (b) to argue that the solution to the I.V.P. y f = 
t 2 + > 2 » y(0) = 1 has a vertical asymptote between jt/ 4 and 1. (Its location is 
near / = 0.96981.) 

0. Consider the I.V.P. y' = 1 + y 2 , y(0) = 1. 

(a) Find an expression for y <2) (r), y <3J (f). and y w (r). 

(b) Evaluate the derivatives at / = 0, and use them to find the first five terms in the 
Maciaurin expansion for tan(f). 


Algorithms md Prograjns 


In Problems 1 through 5 solve the differential equations by Tayi or’s method of order N = 4. 

(a) Let h = 0.1 and do 20 steps with Program 9.3. Then let h ~ 0.05 and do 40 steps 
with Program 9.3. 

(b) Compare the exact solution y(2) with the two approximations in part (a). 

(c) Does the F.G.E. in part (a) behave as expected when h is halved? 

(d) Plot the two approximations and the exact solution on Ihe same coordinate system. 
Hint . The output matrix T4 from Program 9.3 contains the x and y coordinates of 
ihe approximations. The command plot (T4 (: ( i) „ T4 (;, 2) > will produce a graph 
analogous to Figure 9.6. 

L / = t 2 — y with y(0) - 1, y(t) = —e~ ! + 1 1 - 2t + 2 
!. / = 3y + 3f with y(0) = 1, y(0 = \ e 2 ‘ -t-$ 

1. / = —ty with >(0) = 1, y(f) = C" f2/2 

4. y' = e~ 2! - 2y with y(0) =s y(f) = ^e~ 2t + te~ 2t 

5. / = 2ty 2 with y(0) = 1, y(f) = 1/(1 - r 2 ) 

6. (a) Write a program to implement the Richardson improvement method discussea 

in Exercise 6. 

(b) Use your program from part (a) to approximate y (0.8) for the I.V.P. y f = t 2 +y 2 , 
y(0) == 1 over [0, 0,8], The true solution at t = 0,8 is known to be y(0.8) = 
5.8486168. Start with the step size h = 0.05. The program should terminate 
when the absolute value of the difference between two consecutive Richardson 
improvements is < 10 -6 . 
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Solution of Differential Equations 

7. (a) Modify Program 9.3 to carry out Tay lor’s method of order N = 3. 

(b) Use your program from part (a) to solve the LV.P. y ! = t 2 + y 2 , y(0) = 1 m er 
[0,0.8], Find approximate solutions for the step sizes h — 0.05, 0.025, 0.C 125, 
and 0.00625. Plot the four approximations on the same coordinate system. 


9.5 Runge-Kutta Methods 

The Taylor methods in the preceding section have the desirable feature that the F.G.E. 
is of order O(h^), and N can be chosen large so that this eiTor is small. However, the 
shortcomings of the Taylor methods are the a priori determination of N and the com¬ 
putation of the higher derivatives, which can be very complicated. Each Runge-Kut£» 
method is derived from an appropriate Taylor method in such a way that the F.G.E. U of 
order 0(h N ). A trade-off is made to perform several function evaluations at each step 
and eliminate the necessity to compute the higher derivatives. These methods can be 
constructed for any order N. The Runge-Kutta method of order N — 4 is most pofftilar 
It is a good choice for common purposes because it is quite accurate, stable, and ea&j 
to program. Most authorities proclaim that it is not necessary to go to a higher-order 
method because the increased accuracy is offset by additional computational effort. If 
more accuracy is required, then either a smaller step size or ari adaptive method should 
be used. 

The fourth-order Runge-Kutta method (RK4) simulates the accuracy of the Taylor 
series method of order N = 4. The method is based on computing yjt+] as follows. 

(1) y*+L = yk + + W2h. + mh + U4*4, 

where k \, k. 2 , £ 3 , and A 4 have the form 

m — yk ), 

k2=hf(t t +a]h i y k + b\k\), 

*3 = hf{tk + a 2 h, yk + b 2 k\ 4- b$k 2 ), 

^4 = hf (f* + yk + Hh + bski + Hki). 

By matching coefficients with those of the Taylor series method of order N —4 

the local truncation error is of order 0(h 5 ), Runge and Kutta were able to obtain *Ee 
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/i = /fa* y*), 
h = f(tk + ^yk + ^f?j- 

h = f [4 + ^ yk + - fhj ■ 

/ 4 = / (r k + h, y k + /j/ 3 ). 


Discussion about the Method 

The complete development of the equations in (7) is beyond the scope of this book and 
can be found in advanced texts, but we can get some insights. Consider the graph of 
the solution curve y = y</) over the first subinterval [to, 4]- The function values m 
(7) are approximations for slopes to this curve. Here f\ is the slope at the left, / 2 and 
/3 are two estimates for the siope in the middle, and / 4 is the slope at the right (set 
Figure 9.9(a)). The next point </i, yi) is obtained by integrating the slope function 

<8) y(fi)-y(/o)= fit, y(t))dt. 

If Simpson’s rule is applied with step size A/2, the approximation to the integral 
in (8) is 

(9) Jf fit. y(f)) dt ~(/(fQi y(fo)) + 4/(4/2, y(fi/ 2 )) + /(ft, y(ri))), 

where t \/2 is the midpoint of the interval. Three function values are needed; hence ue 
make the obvious choice f{to, y Oo)) = fl and f(t\ , y(fj)) / 4 . For the value in the 
middle we chose the average of / 2 and A: 

r/.. _ h + h 


These values are substituted into (9), which is used in equation ( 8 ) to get yj: 


yi = y° + ^ [ f\ 


4(/ 2 + h) 


When this formula is simplified, it is seen to be equation ( 6 ) with k = 0. The graph 
for the integral in (9) is shown in Figure 9.9(b). 
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y = y(0 




* n 2=h 

m 3 - A 







(a) Predicted slopes ntj to the 
solution curve y = y(t) 


(b) Integral approximation: 

y(f,) -J 0 = ^(/i + 7/ 2 + 2/ 3 + / 4 ) 


Figure 9.9 The graphs y = y(r) and a - /(., y«)> In the discussion of the Runge-Kutta 
method of order N = 4. 

Step Size versus Error 

The error term for Simpson’s rule with step size A/2 is 
■' 1) -/ 4, (ct) 2gg0 - 

If the only error at each step is that given in (11), after U steps the accumulated error 
l or the RK4 method would be 

y-' ) Jll_ a; —5 y W(c)A 4 » OfA 4 ). 

12) ~L^ y * * J 2880 5760 y 

The next theorem states the relationship between F.G.E and step size. It is used 
give us an idea of how much computing effort must be done when using the RK 


ThMrem 9 7 (Precision of the Runge-Kutta Method). Assume that y(r) is the 
S o“*e ( ™ y<0 e C^o, M and (<* *)>JU is the sequence of approxi¬ 
mations generated by the Runge-Kutta method of order 4, then 

\e k \ = |y(tt) - ytl = 0(h 4 ), 

|c* +1 | = |y(h+i) - » - hTste, yt)l = OCh s ). 


(13) 
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In particular, the F.G.E. at the end of the interval will satisfy 

(14) E(y(b),h)= \y{b) = 0(h 4 ). 

Examples 9.10 and 9.11 illustrate Theorem 9.7. If approximations are computed 
using the step sizes h and h /2, we should have 

(15) E(y{b),h)^Ch 4 
for the larger step size, and 

06) E (*<»■ l) ^ C S = T 6 Ch “ * h EWbh kr 


Hence the idea in Theorem 9,7 is that if the step size in the RK4 method is reduced by 
a factor of \ we can expect that die overall F.G.E. will be reduced by a factor of ~. 

Example 9,1.0* Use the RK4 method to solve the I.V.P. v ; = (t — y),l 2 on [0, 31 v. ub 
y (0) = 1. Compare solutions for h = 1, 5 , j, and g. 

Table 9,8 gives the solution values at selected abscissas. For the step size h = 0.25. a 
sample calculation is 


f\ = ® ~°- 5t 

h = 0.125 - (1 + 0.25(0.5)(-U.5)) = Q ^ 
h = 6-125 — (1 +0.25(0.5) (-0.40625)) = _ 04)21(m 

„ 0-25 - (1 + 0.25(-0.4 12j094)j = 

2 


y, = 1.0 + 0.25 ( 


-0.5 + 2(—0.40625) + 2(—0.4121094) - 0.3234863 

6 


= 0.8974915 


n 


Example 9.11. Compare the F.G.E. when the RK4 method is used to solve y' = (f — y)/2 
over (0,31 with y(0) = 1 using step sizes 1, and g. 

Table 9.9 gives the F.G.E- for the various step sizes and shows that the error in the 
approximation to y(3) decreases by about y 6 when the step size is reduced by a factoT 
ofhl±. 

£{y(3), h) = y(3) - y w = 0(h 4 ) « Ch 4 where C = -0.000614. a 

A comparison of Examples 9.10 and 9,11 and Examples 9.8 and 9.9 shows what b 
meant by the statement “The RK4 method simulates the Taylor series method of order 
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Table 9.8 Comparison of the RK4 Solutions with Different Step Sizes for y' = (/ - y)J2 
over [0.3] with y(0) = 1 


yOk ) Exact 
1.0 

0.9432392 


0.8620874 

0.8364023 

0.8118678 

0.8195920 

0.9170997 

1.1036383 

1.3595144 

1.6693905 



1.6701860 


0.8196285 


0.9171423 

1.1036826 

1.3595575 

1.6694308 


0.8364037 


0.8118696 

0.8195940 

0.9171021 

1.1036408 

1.3595168 

1.6693928 


1.0 

0.9432392 

0.8974908 

0.8620874 


0.8364024 


0.8118679 

0.8195921 

0.9170998 

1.1036385 

1.3595145 

1.6693906 


i a die iteration octween step size ana r.u.t. ror tne kK 4 Solutions to 

y' = (t - y)/2 over [0, 3] with y (0) = 1 

F.G.E. 0(h 4 ) Ch A 

Step Number of Approximation Error at r = 3, where 

size, h steps , M to yQ), y M y(3> - y M C = -0.000614 

1 3 1.6701860 -0.0007955 -0.0006140 


1.6694308 


”0.0000403 


-0.0000384 


-0.0000023 


-0.0000024 


- 0.0000001 


N = 4,” For these examples, the two methods generate identical solution sets {(^.. y*)} 
over the given interval. The advantage of the RK4 method is obvious; no formulas for 
the higher derivatives need to be computed nor do they have to be in the program. 

It is not easy to determine the accuracy to which a Runge-Kutta solution has been 
computed. We could estimate the size of y (4) (c) and use formula (12). Another way 
is to repeat the algorithm using a smaller step size and compare results. A third way is 
to adaptively determine the step size, which is done in Program 9.5. In Section 9.6 we 
will see how to change the step size for a multistep method. 


Runge-Kutta Methods of Order N =s 2 

The second-order Runge-Kutta method (denoted RK2) simulates the accuracy of the 
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Taylor series method of order 2. Although this method is not as good to use as the 
RK4 method, its proof is easier to understand and illustrate s the principles involved. 
To start, we write down the Thy lor series formula for y{t 4- A): 

(17) y(t + h) = y(t) + hy'O) + V(0 + C r A 3 + • • ■ , 

where Ct is a constant involving the third derivative of y(t) and the other terms in the 
series involve powers of hi for j > 3. 

The derivatives y'(t) and y"(t) in equation (17) must be expressed in terms of 
/(r, >0 and its partial derivatives. Recall that 

(IS) y'<t) = 

The chain rule for differentiating a function of two variables can be used to differ¬ 
entiate (18) with respect to t. and the result is 

/'(0 = ft(t, y) + f v (t , y)y\t). 


(19) y"(t) = + 

The derivatives (18) and (19) are substituted in (17) to give the Taylor expression 
for _y(f 4 A): 

y(T + k)=y(t)+ hf(t, y ) + \h 2 f t (t , y) 

(20) , Z 

+ ^h l f,(t,y)f(t,y) + C T h 2 + ■ ■ ■ . 

Now consider the Runge-Kutta method of order N = 2, which uses a linear com; 
bination of two function values to express y(f + h): 

(21) y(t + h) - y(f) + Ahf 0 + Bhf\, 


/o = /«.,), 

/i = /(( + p/>,.v+evo). 

Next the Taylor polynomial approximation for a function of two independent vari¬ 
ables is used to expand fit, y) (see the exercises). This gives the following represen¬ 
tation for /[: 

f\ = /('. y) + Phf,(t. y) + QhfJt, y) + C P h 2 + ■■■ , 


(23) 
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where Cp involves the second-order partial derivatives of /(/. y). Then (23) is used 
in (21) to get the RK2 expression for y(t + h): 

y(t 4- h) = y(t) + (A + B)hf{t, y) + BPh 2 f t {t , y) 

(^ 4 ) 

+ BQh 2 f,(t, >■)/(/, y) + BCph 3 + • • ■ . 

A comparison of similar terms in equations (20) and (24) will produce the follow¬ 


ing conclusions: 


hf{t, y) = (A 4- B)hf(t, y) implies that 1 = A 4- B, 

y) = BPh 2 f t (t, y) implies that ^ = BP, 


~h 2 f y {t, y)f(i t y) = BQh 2 f y (t, y) implies that - = BQ. 
Hence, if we require that A, B, P, and Q satisfy the relations 


4 + 5 = 1 


5F = 2 


fi£? = r 


then the RK2 method in (24) will have the same order of accuracy as the Taylor’s 
method in (20). 

Since there ;are only three equations in four unknowns, the system of equations (25) 
is underdetermined, and we are permitted to choose one of the coefficients. There are 
several special choices that have been studied in the literature; we mention two of them. 

Case (if Choose A = This choice leads to B = P — 1, and Q = 1. If 
equation (21) is written with these parameters, the formula is 

(26) y(t + ft) = y(t ) + |(/(r, y) 4 fit +h,y -b hf(t, y))). 

When this scheme is used to generate {((*, y*)}, the result is Heun’s method. 

Case (if); Choose 4 = 0. This choice leads to B = 1, P — and Q = If 
equation (21) is written with these parameters, the formula is 


/ h fi 

yit + h) = y{t) + hfit + - t y + -fit, y) 


When this scheme is used to generate {(ft, yt)}, it is called the modifiedEuler-Cauchy 
method. 
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Kiinge-Kutia-r ehi berg Method (KKr 45) 

One way to guarantee accuracy in the solution of an I. V.P. is to solve the problem tw ice 
using step sizes h and h/2 and compare answers at the mesh points corresponding to 
the larger step size. But this requires a significant amount of computation for the 
smaller step size and must be repeated if it is determined that the agreement is not 
good enough. 

The Runge-Kutta-Fehlberg method (denoted RKF45) is one way to try to resolve 
this problem. It has a procedure to determine if the proper step size h is being used. At 
each step, two different approximations for the solution are made and compared. If the 
two answers are in close agreement, the approximation is accepted. If the tw-o answers 
do not agree to a specified accuracy, the step size is reduced. If the answers agree to 
more significant digits than required, the step size is increased. 

Each step requires the use of the following six values: 

*1 =hf(t k ,y k ), 

ki = hf + i/u yk + , 

*•,=*/ +1*. »+^*.+ 1 * 2 ). 

(28) , / 12 f 1932 7200 , 7296, \ 

J \ 13 ^ 2197 2197 2197 ) 

, , 439, 3680 845 \ 

ks = h fi, i + h ,y t + — kl -« k2 + — kl -— k4 \ 


/ 1 8 
h = hf ( tt + -h, yk - —k 1 


3544, 1859, 11 

2 2565* 3 + 4104* 4 40* 5 


Then an approximation to the solution of Ilie I.V.R is made using a Runge-Kutta 
method of order 4: 


y*+i = yk ■ 


1408 2197, 1 

7777*3 + 7777*4 “ t*5, 
2565 4101 5 


where the four function values f \, fa, / 4 , and f$ are used. Notice that fz is not used 
in formula (29). A better value for the solution is determined using a Runge-Kutta 
method of order 5: 

16 , 6656 , 28,561 9 , 2 

(30) *+. = * + HJ*' + lW 3 + 5M30* 4 - 50* 5 + 55 k( " 

The optimal step size sh can be determined by multiplying the scalar s times the 
current step size h . The scalar s i s 


2|z*+i - y*+j| 


zk+i ~ yt+i 



where Tol is the specified error control tolerance. 

The derivation of formula (31) can be found in advanced books on numerical anal¬ 
ysis. It is important to learn that a fixed step size is not the test strategy even though 
it would give a nicer appearing table of values. If values are needed that an; not in the 
table, polynomial interpolation should be used. 

Example 9.12, Compare RKF45 and RK4 solutions to the I.V.P. 

y' J = 1 -f y 2 with y(0) = 0 on [ 0 , 1.4]. 

An RKF45 program was used with the value Tol = 2 x 10 ~ 5 for the euor control 
tolerance. It automatically changed the step size and generated the 10 approximations to 
the solution in Table 9.10. An RK4 program was used with the a priori step size of h -0.1, 
Vv hteh required the computer to generate 14 approximations at die equally spaced points in 
I able 9.11. The approximations at the right end point are 

y(1.4) Rtf yio = 5.7985045 and y(1.4) as yu = 5.7919748 
and the errors are 

ElO - -0.0006208 and £14 = 0.0059089 
for the RKF45 and RK4 methods, respectively. The RKF45 method has the smaller 


error. 
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Table 9.11 RK4 Solution to y' = 1 + y 2 , y(0) = 0 


* 

ft 

RK4 approximation 
yk 

True solution, 

y(f k ) = tan(£*) 

Error 

yOjt) - y* 

0 

0.0 

0.0000000 

0.0000000 

0.0000000 

1 

0.1 

0.1003346 

0.1003347 

0.0000001 

2 

0.2 

0.2027099 

0.2027100 

0.0000001 

3 

0.3 

0.3093360 

0.3093362 

0.0000002 

4 

0.4 

0.4227930 

0.4227932 

0,0000002 

5 

0.5 

0.5463023 

0,5463025 

0.000 0002 

6 

0.6 

0.6841368 

0.6841368 

0.0000000 

7 

0.7 

0.8422886 

0.8422884 

-0.0000002 

8 

0.8 

1.029639L 

1.0296386 

'0.0000005 

9 

0.9 

1.2601588 

1.2601582 

-0.0000006 

10 

1.0 

1.5574064 

1.5574077 

0.0000013 

11 

1,1 

1.9647466 

1.9647597 

0.0000131 

12 

1.2 

2.5720718 

2.5721516 

0.0000798 

13 

1.3 

3.6015634 

3.6021024 

0.0005390 

14 

1.4 

5.7919748 

5.7978837 

0.0059089 


Program 9.4 (Runge-Kutta Method of Order 4). To approximate the solution 
of the initial value problem y' = fit, y) with y(a) = y$ over fa, b] by using the 
formula 

i yjt+i = yk + ^(ki +2*2 +2*3 + fc*). 

function R=rk4Cf, a, b, y a,M) 

‘/.Input - f in the function entered as a string ’ f J 
% ~ a and b are the left and right end points 

V, - ya is the initial condition yCa) 

'/» - M is the number of steps 

‘/.Output - R*[T' Y’] where T is the vector of abscissas 
l and Y is the vector of ordinates 

h=(b-a)/M; 

T=zeros(l,M+1); 

Y®zeros f1,M+l); 

T-a:h:b; 

Y(l)*ya; 

for j=l:M 

kl=*h*f eval(f ,T(j),Y(j)); 
k2~h*feval(f ,T(j)+h/2.Y Cj > +kl/2); 
k3^h*feval(f t T(j)+h/2,YCj)+Js2/2)j 
k4-h*faval(f,T(j>+h,Y< j)+k3); 


Sec. 9.5 Runge-Kutta Methods 

Y(j+1 )=Y(j)+ (kl +2+k2 +2*k3+k4) /6 ; 
end 

ft=[T’ Y*]; 

The following program implements the Runge-Kutta-Fehlberg Method (RKF45) 
described in (28) through (31). 


Program 9.5 (Rimge-Kutta-Fehlberg Method (RKF45)). To approximate the 
solution of the initial value problem y' = f it, y) with y(a) = yo over fa, b] with 
an error control and step-size method. 

function R»rkf45(f ,a,b,ya.,M*tol) 

‘/.Input - f is the function entered as a string *t " 

% - a and b are the left and right end points 

% ~ ya is the initial condition y(a) 

% - M is the number of steps 

'/. - tol is the tolerance 

YQtrtptit - R*[T> Y J ] where T is the vector of abscissas 
% and Y is the vector of ordinates 

'/.Enter the coefficients necessary to calculate the 
Rvalues in (28) and (29) 

a2*l/4;b2=l/4;a3^3/8;b3=3/32;c3=9/32;a4=12/13; 

M-1932/2197;c4=-7200/2l97;d4=7296/2l97ja5=l; 

b5“439/2l6;c5=~9jd5*3680/513;e5"-845/4104;a6=l/2; 

b6=-8/27;c6*2;d6*-3544/2565;e6*1859/4104; 

f6—11/40;r1=1/360;r3=-12S/427S;r4=-2197/75240;r5=l/50; 

r6=2/55;nl=25/2l6;n3=l408/2565;n4=2l97/4104;n5*-l/5; 

big=lel5; 

h-(b-*)/K; 

hmin=h/64; 

hmax=64*h; 

maxl-200; 

Y(l)=ya; 

TCl)=a; 

J-i; 

br=b-0.00001*abs(b); 
while (T(j)<b) 

if C(T(j)+h)>br) 
h-b-T(j); 
end 

^Calculation of values in (28) and (29) 
kl=h*f eval(f ,T(j) s Y(j)); 
y2^Y(j)+b2*kl; 
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if big<abs(y2)break,end 

k2=h*feval(f,T(j)+a2*h,y2); 

y3=Y(j)+b3*kl+c3*k2; 

if big<abs(y3)break,end 

k3=h*feval(f t T(j)+a3*h,y3); 

y4=Y(j 3 +b4*kl+c4*k2+d4*k3; 

if big<aba(y4)break,end 

k4=h*feval(f ,Y(j) +a4*h,y4); 

y5=Y( j )+b5*kl+c5*k2+d5*k3+e5*k4; 

if big<abaCy5)break.end 

k5=h*feval(f,T(j)+a5*b,y5); 

y6=Y Cj)+b6*kl+c6*k2+d6*k3+e6*k4+f6*k5; 

if big<abs(y6)break,end 

k6=h*feval(f,Y(j)+a6*h,y6>; 

err«abs(rl*kl+r3*k3+r4*k4+r5*k5+r6*k6); 

ynew=Y< j)+n1*k1+n3 *k3+n4*k4+n5*k5; 

/.Error and step size control 

if((err<tol)I(h<2*hmin)) 

Y(j+l)“ynew: 
if ((T(ji)+h)>br) 

TCj+l)=b; 

else 

T(j+l)=T(j)+h; 

end 

j-J+ii 

end 

if (err==0) 
s=0; 
else 

s=0. 84+ (tol+h/err)-(0.25); 

end 

if C(s<0.75 )& (b>2*hmin)) 
h=h/2; 

end 

if ((s>1.5C0&(2*h<hniax)) 

h*=2*h: 

end 

if C(big<abs(Y Cj)))I(maxl==j)).break,end 

M-ji 

if (b>T(j» 

M=j+i; 
else 
M=j; 
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end 


end 

R~ [T’ Y’]; 


Exercises for Runge-Ku itta Methods __ 

In Exercises 1 through 5, solve the differential equations by the Runge-Kutta method of 
order N — 4. 

(a) Let h = 0.2 and do two steps by hand calculation. Then iet h =0.1 and do four 
steps by hand calculation, 

(b) Compare the exact solution >- (0.4) with the two approximations in part (a). 

(c) Does the F.G.E. in part (a) behave as expected when h is halved? 

1. / = t 2 - y with y(0) = 1, y(r) = - e~ 1 + t 2 - It + 2 

2. / = 3y + 3f with y(0) - I, y(r) = f<? 3 ' - t - £ 

3* y' = -ty with _v(0) = I, y(f) = e~ l2/1 

4* y' = e~ 2t - 2y with y(0) = y(t) = Tr\e~ 2t + te _2 ' 

5. y' = 2ry 2 with y(0) = 1, y(t) = 1/(1 - t 2 ) 

6. Show that when the Runge-Kutta method of order N = 4 is used to solve the I.V.P 
y = /(/, y) over [a, b ] with y(a) = 0 the result is 

h M ~ l 

y(b) - V (/(?*) +4/(ft + i/2> + /(to+i)), 

6 o 


where h = (b - a)/M , and t k = a + kh, and t k +]/2 = a + (* + J) h, which is 
Simpson’s approximation (with step size h/2) for the definite integral of /(f) taken 
over the interval [a, b], 

7. The Richardson improvement method discussed in Lemma 7.1 (Section 7.3) can be 
used in conjunction with the Runge-Kutta method. If the Runge-Kutta method of 
order N = 4 is used with step size h, we have 

y(b) « > A + Ch 4 . 

If the Runge-Kutta method of order -V = 4 is used with step size 2k, we have 

y(fc) ^ yv> + Ch 4 . 

The terms involving Ch 4 can be eliminated to obtain an improved approximation for 
y(£0, and the result is 


y{b) ^ 


ibyj, ~ vih 
15 
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This improvement scheme can be used with the values in Example 9,11 to obtain 
better approximations to y(3). Find the missing entries in the table below. 



For Exercises 8 and 9, the Taylor polynomial of degree N — 2 for a function f(t, y) of two 
variables t and y expanded about the point {a, b) is 


Pi(t, y) = f{a, b) + ft (a , b)(t -a) + f y (a, b)(y - b ) 

b)(t — a) 2 , , , v , 

+ —- j -- + f ty (a, k)(i - a)(y - 


fyy(a, b){y - by 
2 


8. (a) Find the Taylor polynomial of degree N — 2 for fit, y) = yjt expands 

about (1, 1). 

(b) Find ft(1,05, LI) and compare with /(1.05, LI), 

9. (a) Find the Taylor polynomial of degree N — 2 for fit, y) = (1 H-r — 

expanded about (0, 0). 

(b) Find ^2(0.04,0,08) and compare with /(0.04, 0.08). 


Algorithms and Programs _ 

In Problems 1 through 5, solve the differential equations by the Runge-Kutta method of 
order <Y = 4. 

(a) Let h = 0.1 and do 20 steps with Program 9.4. Then let h = 0.05 Eind do 40 steps, 
with Program 9.4. 

(b) Compare the exact solution y(2) with the two approximations in part (a). 


(d) Plot the two approximations and the exact solution on the same coordinate system. 
Hint. The output matrix R from Program 9.4 contains the x and y coordinates pf s 
the approximations. The command plot (R(: , 1) ,R(: ,2) ) will produce a graph 
analogous to Figure 9.6. 

1. / = t 2 - y with y{0) = 1, y(f) = -e"' + 1 2 - 2t + 2 

2. y' = 3y + 3i with y(0) = 1, y{t) = \e 3t - / - \ 
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3. / = ~ty with y(0) = 1, y(f) = e “ f ^ 2 

4. / = c" 2 ' - 2y with y (0) = y(,) = + te^ 

5. y r = 2iy 2 with y(0) = 1. y(t) = 1/(1 - f 2 ) 

In Problems 6 and 7, solve the differential equations by the Runge-Kutta-Fehlberg method. 

(a) Use Program 9.5 with initial step size h = 0.1 and to! = 1CT 7 . 

(b) Compare the exact solution y(b) with the approximation. 

(c) Plot the approximation and the exact solution on the same coordinate system. 

6. y' = 9ie 3t , y{0) = 0 over [0,3], y(t) = Ste 3t - e 3f + 1 

7. y' = 2 tan -1 (0. y(0) == 0 over [0,1], y {t) = 2 1 tan -1 (/) — ln(I -f- 1 2 ) 

8. In a chemical reaction, one molecule of A combines with one molec ule of B to form 
one molecule of the chemical C It is found that the concentration y\t) of C at time t 
is the solution to the I.V.P. 

/ — k(a - y){b - y) with y(0) — 0. 

where * is a positive constant and a and b are the i nitial concentrations of A and 
B, respectively. Suppose that k = 0.01, a = 70 millimoles/liter, and b = 50 mil- 
hiiioles/hter. Use the Runge-Kutta method of order N = 4 with h = 0.5 to find 
the solution over [0, 20J. Remark. You can compare your computer solution with the 
exact solution y(f) = 350(1 - - 5 e~^). Observe that the limiting value 

is 50 as r -> +oo. 

9. By solving an appropriate initial value problem, make a table of values of the function 
fit) given by the following integral; 

/(l)= 5 + vfc/ 0 < j: < 3- 

Use the Runge-Kutta method of order AT = 4 with h = 0. J for your computations. 
Your solution should agree with the values in the following table. Remark. This is a 
good way to generate the table of areas for a standard normal distribution. 

*) 

0.0 0.5 

0.5 0.6914625 

1.0 0.8413448 

1.5 0.9331928 

2.0 0.9772499 

2.5 0,91937903 

3.0 0.99 86501 

10, (a) Write a program to implement the Richardson improvement method discussed 
in Exercise 1. 

(b) Use your program from part (a) to approximate y (0.8) for the I.V.P. / = t 2 +y 2 , 
y(0) = 1 over 10,0.8]. The true solution at t = 0.8 is known to be y(0.8) = 
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5.8486168. Start with the step size h — 0.05, The program should terminate 
when the absolute value of the difference between two consecutive Richardson 
improvements is < 10 ^ 7 . 

11. Consider the first-order integro-ordinary differential equation: 

/ = 1.3y - 0.25y 2 - O.OOOly f yit)dr. 

JO 

(a) Use the Rtinge-Kutta method of order 4 with h — 0.2, and y (0) = 250 over the 
interval [ 0 , 20 ], and the trapezoidal rule to find an approximate solution to the 
equation (see Problem 10 in the Algorithms and Programs in Section 9.2). 

(b) Repeat part (a) using the initial values y(0) = 200 and y (0) = 300. 

(c) Plot the approximate solutions from parts (a) andl (b) on the same coordinate 
system. 


9 *6 Predictor-Corrector Methods 

'["he methods of Euler, Heun, Taylor, and Runge-Kutta are called single-step methods 
because they use only the information from one previous point to compute the sue 
cessive point; that is, only the initial point (fo, yo) is used to compute {t\, yi) ami. 
in general, y* is needed to compute yjt+i- After several points have been found, it 
is feasible to use several prior points in the calculation. For illustration, we develop 
the Adams-Bash forth four-step method, which requires y k - 3 , y k - 2 , and y k in 

the calculation of yt+j. This method is not self-starting; four initial points (lo, 3 . s t. 
Oi, vi), U 2 , V 2 ), and (f 3 , yi) must be given in advance in order to generate the points 
{( 4 , yO ■ k > 4 ], 

A desirable feature of a multistep method is that the local truncation error (L.T.L i 
can be determined and a correction term can be included, which improves the accuracy 
of the answer at each step. Also, it is possible to determine if the step size is sm.ill 
enough to obtain an accurate value for y*+i, yet large enough so that unnecessary and 
time-consuming calculations are eliminated. Using the combinations of a predicror 
and corrector requires only two function evaluations of fit, y) per step 

The Adams-Bashforth-Moulton Method 

The Adams-Bashforth-Moulton predictor-corrector method is a multistep method de¬ 
rived from the fundamental theorem of calculus: 

f r k+\ 

(0 y(4+i) = y(4) + / 

Jh 
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z =/(r,y(/)) 


z=/0, y(0) 






4-3 l k-2 'A-l 


(a) The four nodes for the 
Adams-Bashforth predictor 
(extrapolation is used). 


4-3 l k-2 4-1 4 4+1 

(a) The four nodes for the 
Adams-Moulton corrector 
(interpolation is used). 


Figure 9.10 Integration over [ 4 , 4 _ ] ] in the Adams-Bashforth method. 

The predictor uses the Lagrange polynomial approximation for /(f, y(0) based 
on the points (f*_ 3 , /*_ 3 ) t fk- 2), (4- 1> fk-0, and (4, /*). It is integrated over 
die interval [4, 4+1] in ( 1 ). This process produces the Adams-Bashforth predictor: 

( 2 ) = y k + 4(_9/,_5 + 37/a_2 - 59 / t _, + 55 /,). 

The corrector is developed similarly. The value p k +[ just computed can now be 
used. A second Lagrange polynomial for f(t, y(/)) is constructed, which is based 
on the points (4-2, fk- 2), (ft—1* ft- 1), (4. /*)» and the new point (4+1, /*+]) = 
(4+1, /(4+i, /4+1)). This polynomial is then integrated over [4, 4+1] producing the 
Adams-Moulton corrector: 

( 3 ) >'k+\ — yk + Tr(fk -2 — 5 ft- i 4-1 9f k + 9 /*+i). 

24 

Figure 9.10 shows the nodes for the Lagrange polynomials that are used in developing 
formulas ( 2 ) and ( 3 ), respectively. 

Error Estimation and Correction 

The error terms for the numerical integration formulas used to obtain both the predictor 
and corrector are of the order Q(h 5 ). The L.T.E. for formulas ( 2 ) and ( 3 ) are 


(4) 

— nir 1 1 

= ^l v &(c^h 5 

(L.T.E. 

lor the predictor), 

j \'ivTi ' r 'vt* 

720" - 


(5) 

y(fft+i)" yk+\ 

= =^ (5 Wi)* 5 

(L.T.E. 

for the corrector) 


Suppose that h is small and y^(r) is nearly constant over the interval; then the 
terms involving the fifth derivative in (4) and (5) can be eliminated, and the result is 


y(4+i)- yt+i ^ — JqOa+i -pjh-i)- 
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4 - 3/2 4-1 4 - 1/2 


Figure 9.11 Reduction of the step size to hj2 in aw adaptive method. 


mula (6) gives an approximate error estimate based on the two computed values p k - 
and Vfc+i and does not use y (5} (0- 


Practical Considerations 

The corrector (3) used the approximation /*+1 « /(4+i, P*+i) in the calculation 
of v* + ]. Since y k+] is also an estimate for y(4+l)> it could be used in the corrector (3) 
to generate a new approximation for /*+i, which in turn will generate a new value 
for yi+j. However, when this iteration on the corrector is continued, it will converge 
to a fixed point of (3) rather than the differential equation. It is more efficient to reduce 
the step size if more accuracy is needed. 

Formula (6) can be used to determine when to change the step size. Although 
elaborate methods are available, we show how to reduce the step size to h/2 or increase 
it to 2 h. Let RelErr = 5 x 10” 6 be our relative error criterion, and let Small = 10 -5 . 


lyt+i - Pit+ii 
lyjt-ul + Small 


> RelErr, 


then set h = -. 


270 lyjt+il + Small 2 

19^-PhjI 8®r dw«*=2*. 

1 270 !y,+i | -t- Small 100 

When the predicted and corrected values do not agree to five significant digits, 
then (7) reduces the step size. If they agree to seven or more significant digits, then (8) 
increases the step size. Fine-tuning of these parameters should be made to suit your 
particular computer. 

Reducing the step size required four new starting values. Interpolation of /(f, /(*)) 
with a fourth-degree polynomial is used to supply the missing values that bisect the in* 
tervals [4- 2 . 4-1 ] and [r*_i, 4 ]. The four mesh points 4 - 3 / 2 . 4-1- 4-1/2, and 4 used 
in the successive calculations are shown in Figure 9.11. 

The interpolation formulas needed to obtain the new starting values for the step 
size h /2 are 

r -5/*-4 + 28 4-3 - 70 4_2 4- 140A-, + 35.4 
A-1/2 =- 128 - —■ 

(91 „ lfi-4 - 20/,_3 + 90/a_ 2 + 60/,-, - 5/, 


Increasing the step size is an easier task. Seven prior points are needed to doubly 
the step size. The four new points are obtained by omitting every second one, as shown 
in Figure 9.12. 
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--New mesh 

--Old mesh 


Figure 9.12 Increasing the step size to 2h in an adaptive method. 


Milne-Simpson Method 

Another popular predictor-corrector scheme is known as the Milne-Simpson method. 
Its predictor is Iwtsed on integration of fit, y (r)) over the intei-vai [ 4 . 3 . 4 +1 ]: 

W y(4+i) = y(4-3) + / fit, y(0) dt. 

A-3 

The predictor uses the Lagrange polynomial approximation for fit , y(t)) based 
on the points (4-3, /*_3), (4-2- fk-i), ( 4 -i, fk- 1), and (4, 4 ). It is integrated over 
the interval [4-3, 4+1]- This produces the Milne predictor: 

Ah 

Pk+ 1 = y.t-3 + —{lfk-2 — fk- i +2 fk). 

The corrector is developed similarly. The value p k+] can now be used, A sec¬ 
ond Lagrange polynomial for /{/, >■(/)) is constructed, which is based on the points 
(4_l, A-i), ( 4 , fk), and the new point (4+1, _4+i) = (4+i, /(4+1, Pk+i)). The 
polynomial is integrated over [4. 1,4 +i ], and the result is the familiar Simpson’s rule: 

< 12 > y*+i = y k -\ +^(/*-i +4f t +fk+i). 


Error Estimation and Correction 

The error terms for the numerical integration formulas used to obtain both the predictor 
and corrector are of the order O (h 5 ) . The L.T.E. for the formuJLas in (11) and (12) are 

28 

(13) y(4+]> - Pk+] = ^y c3) (ci + i)/i 5 (L.T.E. for the predictor). 

(14) y(4+i) - J4+J = —-y (5) (djt+i)/i 5 (L.T.E. for the corrector). 

Suppose that h is small enough so that y'' 5) (t) is nearly constant over the interval 
[4-3,4 + 1 3- Then the terms involving the fifth derivative can be eliminated in (13) and 

(14) and the result is 

28 

(15) y(4+i) - Pk +1 «« — (Vfc+i - Pk+ 1). 

Formula (15) gives an error estimate for dtte predictor that is based on ihe two 
computed values p k +\ and y k +: and does not use y (5 40< It can be used to improve the 
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predicted value, Under the assumption that the difference between the predicted and 
corrected values at each step changes slowly, we can substitute p k and y* for 1 and 
y k+i in (15) and get the following modifier: 

Vi- — Pk 

(16) m+\ = Pk+i +28———. 

This modified value is used in place of p k +i in the correction step, and equation (12) 
becomes 


(17) 

yir+l = y*-I + +4fk + 


Therefore, the improved (modified) Milne-Simpson method 

is 


4h 

Pk+\ - yk -3 + T <2/*_2 - fk -1 +2fk) 

(predictor) 


m i, 1.1 — Dz-_i_ i -1- 28 

(modifier) 

US) 

n. 1 1 X "■ 1 « rtQ 



fk+i = /(te+i.JWA+i) 



y*+i = yt-i + + 4/* + fk+ i) 

(corrector). 


Hamming’s method is another important method We shall omit its derivation, but 
furnish a program at the end of the section. As a final precaution we mention that all 
the predictor-corrector methods have stability problems. Stability is an advanced topic 
and the serious reader should research this subject. 

Example 9,13. Use the Adams-Bashforth-Moulton, Milne-Simpson, and Hamming meth¬ 
ods with h = | and compute approximations for the solution of the I.V.P. 

y' = Lzl t y(0) := 1 over [0,3], 

A Runge-Kutta method was used to obtain the starting values 

yt ^ 0,94323919, y 2 = 0,89749071, and y 3 = 0.86208736. 

Then a computer implementation of Programs 9.6 through 9.8 produced the values in Ta¬ 
ble 9.12. The error for each entry in the table is given as a multiple of 10 -8 . In all entries 
there are at least six digits of accuracy. In this example, the best answers were produced by 
Hamming’s method. ■ 
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Table 9.12 Comparison of the Adams-Bashfoilh-Moulton, Milne-Simpson, and Hamming 
Methods for Solving y 1 = (t — y)/2, y(0) = 1 


Error 

0 £ -8 
IE —8 
IE-8 
IE-8 
0E-8 
0E-8 
IE —8 
2E-8 
2E-8 
2E-8 
2E-8 
2E-8 


The Right Step 

Our selection of methods has a purposeL first, their development is easy enough for a 
first course; second, more advanced methods have a similar development; third, most 
undergraduate problems can be solved by one of these methods. However, when a 
predictor-corrector method is used to solve the I.V.P. y' = fit, y), where yfo) — yo, 
over a large inteirvai, difficulties sometimes occur. 

If y) < 0 and the step size is too large, a predictor-corrector method might 
be unstable. As a rule of thumb, stability exists when a small error is propagated as a 
decreasing error, and instability exists when a small error is propagated as an increasing 
error. When too large a step size is used qver a large interval, instability will result and 
is sometimes manifest by oscillations in the computed solution. They can be attenuated 
by changing to a smaller step size. Formulas (7) through (9) suggest how to modify 
the algorithm(s). When step-size control is included, the following error estimate(s) 
should be used: 

Pk ~ Vt 

(19) y ( tk) ~ }’k ^ 19- (A.dams-Bashforth-Moulton), 

(20) y (It) - yt = Pt (Milne-Simpson), 

(21) >>((*)= (Hamming). 



Adams- 

Bashforth- 


Milne- 


Hamming’s 

fc 

Moulton 

Error 

Simpson 

Error 

method 

0.0 

1.00000000 

o 

t*i 

l 

00 

1.00000000 

0E-8 

1.00000000 

0.5 

0.83640227 

8E-8 

0.83640231 

4E-8 

0,83640234 

0.625 

0.81984673 

16E-8 

0.81984687 

2£ — 8 

0.81984688 

0.75 

0.81186762 

22E-8 

0.81186778 

6E-8 

0.81186783 

0.875 

0.81194530 

28E-8 

0.81194555 

3£ — 8 

0.81194558 

1.0 

0.81959166 

32E-8 

0.81959190 

8E-8 

0.81959198 

1.5 

0.91709920 

46E-8 

0.91709957 

9E-8 

0.91709967 

2.0 

1.10363781 

51E-8 

L. 10363822 

10E “8 

1.10363834 

2.5 

1.35951387 

52E-8 

1.35951429 

10E-8 

1.35951441 

2.625 

1.43243853 

52E-8 

1.43243899 

6E-8 

1.43243907 

2.75 

1.50851827 

52E-8 

1.50851869 

10E —8 

1.50851881 

2.875 

1.58756195 

51E-8 

1.58756240 

6E —8 

1.58756248 

3,0 

1.66938998 

50E-8 

1.66939038 

10E-8 

1.66939050 


In all methods, the corrector step is a type of fixed-point iteration. It can be proved 
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that the step size h for the methods must satisfy the following conditions.' 
ny\ h (Adams-Bashforth-Moulton), 

K } \f y {t*y) I 

3.00000 _ , 

[23) h « , __ v , (Milne-Simpson) 

1/yUi y/\ 

2.66667 

(24) *« wT7>I 

The notation «in (22) through (24) means "much smaller than” The next example 
shows that more stringent inequalities should be used: 

[75) h < — - (Adams-Bashforth-Moulton), 

1 ' i f y (t, y)l 

(26) h < °' 45 -— (Milne-Simpson), 

1/yV* y)\ 

0 69 

(27) *" W^)\ (H—) ' 

Inequality (27) is found in advanced books on numerical analysis. The other two in¬ 
equalities seem appropriate for the example. 

Example 9.14. Use the Adams-Bashforth-Moulton, Milne-Simpson, and Hamming meth¬ 
ods and compute approximations for the solution of 

■/ _ 30 - 5y, >’<0) = 1 over the interval [0,10]. 

All three methods are of the order 0(h 4 ). When N - 120 steps was used for all three 
methods, the maximum error for each method occurred at a different place: 

y (0.41666667) - y 5 as -0.00277037 (Adams-Bashforth-Moulton), 
y (0,33333333) - y 4 » -0.00139255 (Milne-Simpson), 
y(0.33333333) - y 4 a* -0.00104982 (Hamming). 

At the right end points r = 10, the error was 

y (10) - y 1 2o 0.00000000 (Adams-Bashforth-Moulton), 
y(10) - yi2o 0.00001015 (Milne-Simpson), 

y(l0) - yi 2 o » 0.00000000 (Hamming). 

Both the Adams-Bashforth-Moulton and Hamming methods gave approximate solution 
with eight digits of accuracy at the right end point. 
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0123456789 10 


Figure 9,13 (a) The Adams-Bashforth-Moulton solution 
to y’ — 30 — 5y with N = 3 7 steps produces oscilla¬ 
tion, It is stabilized when N = 65 because h = 10/65 =s 
0.1538 ^ 0.15 = 0.75/5 = 0.75/!/.,(r, y)f T 


y 



0123456789 10 


figure 9.13 (b) The Mdoe-Simpson solution to y' = 30 - 
5y with N — 93 steps produces oscillation. It is stabilized 
when N s= 110 because h s= 10/110 = 0.0909 as 0.09 = 

0.45/5 = 0.45/1/y(r, y)|. 

It is instructive to see that if the step size is too large die computed solution os¬ 
cillates about the true solution. Figure 9.13 illustrates this phenomenon. The small 
number of steps was determined experimentally so that the oscillations was about the 
same magnitude. The large number of steps required to attenuate the oscillations were 
determined with equations (25) through (27). 

Each of the following three programs requires that the first four coordinates of T 
and Y be initial starting values obtained by another method. Consider Example 9.13, 
where the step size was h — | and the interval was [0, 3j. The following string of 
commands in the MATLAB command window will produce appropriate input vec- 
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Figure 9.13 (c) Hamming's solution to y' — 30 — 5y 
with N =50 steps produces oscillation. It is stabilized 
when N — 70 because h - 10/70 = 0.1428 « 0.138 = 
0.69/5=0.69/1 f y {t t y)|. 


tors T and Y. 

>>T=zeros(l,25); 

»Y=zeros(l,25); 

»T=0:1/8:3; 

»Y(1: 4)*[1 0.94323919 0.89749071 0.86208736]; 


Program9.6 ( Adams-Bashforth-Moulton Method). To approximate the solution 
of the initial value problem / - f(t, y) with y(a) = y 0 over [ a, b] by using the 
predictor 


Pt+i = W + ^(-9/a- 3 + 37fk-2 - 4- 55A) 

and the corrector 


Yfe+1 = yk + ~r(/*-2 - 5fk~] + 19/i + 9 f k+i ). 
function A=abm(f,T,Y) 

“/Input - f is the function entered as a string ’ f’ 

% - T is the vector of abscissas 

7. - Y is the vector of ordinates 

7,Remark. The first four coordinates of T and Y must 

*/* have starting values obtained with RK4 

V.Output - A=[T* Y 1 ] where T is the vector of abscissas and 

7* Y is the vector of ordinates 

n=length(T); 
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if n<5 t break,end; 

F=zeros(l,4); 

F-fevalCf,T(1:4),Y(1:4)); 
h=T(2)-T(l); 
for k=4:n-l 
’/.Predictor 

p=Y(k)+(h/24)*{F*[-9 37 -59 55]’}; 

T(k+l)*=T(l)+h*k; 

F= [F(2) FC3) F(4) fevaKf (Tfk+l) ,p)] ; 

“'.Corrector 

Y(k+l)=Y(k>Kh/24)*(F*[l -5 19 9] 3 ); 

F(4)=feval(tf ,T(k+l),Y(k+l)); 

end 

A= [T' Y’]; 

Program 9.7 (Milne-Simpson Method). To approximate the solution of the initial 
value problem y = f(t, y) with yyu) ~ >o over [a, b] by using the predictor 


and the corrector 


Pk+ 1 = yk- 3 + y (Zfk -2 - fk-i + 2/0 


yjt+i = yk -1 + + /*+])■ 


function M=mil:[ie(f ,T,Y) 

7,Input - f is the function entered as a string 

V - T is the vector of abscissas 

*/ t - Y is the vector of ordinates 

7,Remark. The first four coordinates of T and Y must 

y t have starting values obtained with RK4 

’/.Output - M=[T> Y’] where T is the vector of abscissas and 

% Y is the vector of ordinates 

n=length(T); 
if n<5,break,end; 

F=zeros(l,4); 

F=feval(f,T(1:4),Y(1:4)); 
h-TC2)-T(l); 
pold=Q; 
yold=0; 
for k=4:n-l 
^Predictor 

pnew=Y(k-3)+(4*h/3)*(F(2:4)*[2 -1 2]’); 
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'/.Modifier 

pmod=pnew+28* (yold-pold)/29; 

T(k+l>=T(l)+h*k; 

F=[F(2) F (3) F(4) fevalCf,T(k+l) ,pmod) ]; 

'/.Corrector 

Y(k+1 )=Y(l c -l) + (h/3)*(F(2:4)*[l 4 l] J ); 

pold=pnew; 

yold=Y(k+l); 

F(4)=feval(f.TCk+1),YCk+l))i 

end 

M=[T J Y’] ; 

Program 9.8 (Hamming Method). To approximate the solution of the initial value 
problem / - /(/, y) with y(a) = yo over [a, b ] by using the predictor 

Ah 

Pk+ 1 - yk -3 + -J-(2fk-2 - fk -1 + 2 fk) 

and the corrector 


function H~hamniing(f ,T,Y) 

'/.Input - f is the function entered as a string ’f’ 

% - T is the vector of abscissas 

% - Y is the vector of ordinates 

'/•Remark. The first four coordinates of T and Y must 

% have starting values obtained with RK4 

/iuutput - K-[T’ Y’] where T is the vector of abscissas and 

•/, Y is the vector of ordinates 

n=length(T ); 
if n<5,break,end; 

F-zeros(1,4); 

F=feval(f,T(1:4),Y(1:4)); 
h=T(2)-T(l); 
pold=G; 
cold=0; 
for k=4:n-i 
’/.Predictor 

pnew=Y(k-3)+(4*h/3)*(F(2:4)*[2 -1 2]’); 

•/.Modifier 

pmod*pnew+112*(cold-pold)/i2i; 

T(k+l)=T(l)+h*k; 


-yk-i + Vyt , t . Ll f, ! r. 

yt+i =- g -h + 2 Jk + jk+i). 
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F«[F(2) F(3) F(4) tevaMf,T(k+l),pmod)]; 

^Corrector 

cnew=(9*Y(k)-Y(k-2)+3*h*(F(2:4)*[-1 2 l]»)}/8; 

Y(k+l)=cnew+9*(pnew-cnew)/12i; 

po!d=pnew; 

cold*cnew; 

F(4)=feval(f,T(k+l).Y(k+1)); 

end 

H-CT 5 Y'3 ; 


Exercises for Predictor-Corrector Methods 

In Exercises 1 through 3, use the Adams-Bashforth-Moulton method, the three starting 
values y { , y 2l and Vi, and the step size h = 0.05 to calculate by hand the next two values 
y 4 and vs for the LV.P Compare your solution with the exact solution y(?). 

1. y' - t 2 - y, y(0) = 1 over [0,5], y(r) = -e~ ! 4- 1 1 - It + 2 

y(0.05) = 0.95127058 
y(0.10) =0.90516258 
y(Q.15) = 0,86179202 

2. y' = y + 3f - f 2 , y(0) = 1 over [0,5], y(t) = 2e' 4- t 2 - t — 1 

y(0.05) = 1.0550422 
y(Q.10) = 1.1203418 
>(0.15) = 1.1961685 

3. y’ — -r/y, y(l) = I overp, 1.4], y(t) = (2 - t 2 ) 1 ^ 
y(l .05) = 0.94736477 

y(U0) = 0,88881944 
y(l,15) = 0.82310388 

In Exercises 4 through 6, use the Milne-Simpson method, the three starting values y,, >- 2 , 
and yj, and the step size A = 0.05 to calculate by hand the next two values y 4 and y 5 for 
the I.V.p. Compare your solution with the exact solution y(t). 

4. y' = e"' - y, y(0) = I over (0, 5], y(r) = te~‘ 4- e~ ! 
y(0.05) = 0.99879090 

y(0.10) = 0.99532116 
y(0.15) = 0.98981417 

5. y' = 2ry 2 , y(0) = 1 over 10, 0.95], y(f) = 1/(1 - t 2 ) 

y(0.05) = 1.0025063 
y(0.10) = 1.0101010 
y(0,15) = 1,0230179 
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6. / = 1 + y 1 , y( 0) = 1 over [0,0.75], y(f) = tan(f + jt/ 4) 
y(0.05) = 1.1053556 
jy CO. 10) = 1.2230489 
y(0.15) = 1.3560879 


In Exercises 7 through 9, use the Hamming method, the three starting values vi 
_V 3 , and the step size h — 0.05 to calculate by hand the next two values V 4 and 


7. v / = 2v — v 2 ., v(0) = 1 over [0, 5), v(r) = 1 4- tanh(f) 


v:, and 
for Lhe 


y(0.05) = 1.0499584 
y{0-10) = 1.0996680 
y(0.15)- 1.1488850 

8. / = (1 - y 2 ) 1/2 , y(0) = 0 over [0, 1.55], y(t) = sin(f) 
y(0.05) = 0.049979169 
y(0,10) = 0.099833417 
y(Q.15) = 0.14943813 


9. / = y 2 sin(f), y{0) = 1 over [0, 1.55], y(t) = sec(r) 
y(0.05) = 1.0012513 
y(0.10) = 1.0050209 
y(0.15) = 1.0113564 


Algorithms an d Programs 


fh) Plot your approximation and the exact solution on the same coordinate system 


2. (a) Use Program 9.7 to solve the differential equations in Exercises 4 through 6. 
(b) Plot your approximation and the exact solution on the same coordinate system. 


3. (a) Use Program 9.8 to solve the differential equations in Exercises 7 through 9. 

(b) Plot your approximation and the exact solution on the same coordinate system. 

4. Produce a graph analogous to Figure 9.13 by using Program 9.6 with N — 31 and 
N — 65 to solve the l.V.P. 


y' = 30 - 5y, y(0) = 1 over [0,10]. 


5. For the l.V.P. / ^ 45 - 9 y, y( 1) = 0 over [1,20]: 

(a) Use inequality (22) to determine for which step sizes the Adams-Bashforth 
Moulton method might be unstable. 

(b) Based on your results from part (a) select step sizes h s and h u for which the 
Adams-Bashforth-Moulton method should be stable and unstable, respectively. 
Use a Runge-Kutta method to generate three starting values yi, _V 2 . and yi lor 
each of the step sizes. 


Sec. 9.7 Systems of Differential Equations 
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Numerical Solutions 

A numerical solution to (1) over the interval a $ t < b is found by considering the 
differentials 


(5) dx = f(t,x t y)dt and dy ~ g{t,x,y)dt. 

Euler’s method for solving the system is easy to formulate. The different 
- ft. dx — Afc+i — x k> an« dy — >*+i — >* are substituted into (5) to ge 

_ **+i - *k ** /(ft. **, y*)(ft+i - ft). 

y*+i - yk ** g(ft, x ki y k )(tk+\ - ft)- 

The interval is divided into M subintervals of width h = (b — d)jM, and the mesh 
points are ft+i = ft -f- h. This is used in ( 6 ) to get the recursive formulas for Euler\ 
method: 

ft+i = tk + h, 

( 7 ) **-1-1 = ** + A/(ft, **, y*), 

y*+i -yt+ Ag(ft, X k ,y k ) for k = 0, 1, ..., M - l. 

A higher-order method should be used to achieve a reasonable amount of accura<__\ 
For example, the Runge-Kutta formulas of order 4 are 


( 8 ) 


ft 

x k+ i = x k + ~(/i 4- 2/2 4- 2/3 + /O, 

yk +] = yk + 7(51 4- 2 g 2 -r 2 g 3 4- g 4 >, 

o 


where 


fi = /(ft,**, y*), 

, J A h , h 

h — f (ft + 2 ’ Xk + 2^ 1, yk + 2 Si 

{ h h h \ 

h-f I i ^ + -,** + -h> yk + - 8 21 

U = / (ft + A, x* + A/ 3 , y* + Ag 3 ), 


8\ = g(ft,**. y*). 

f h k h \ 

g 2 = g I ft + -, ** + -/1 , y* + -gi I, 

/ A A ^ A \ 

g3 = g ( ft + 2 * Jf* + 2 ^ 2 ’ n + 2 82 )' 

g4 = g (ft + A, ** + A/ 3> y* + Ag 3 ). 


Example 9.15. Use the Runge-Kutta method given in ( 8 ) and compute the numerical 
solution to (3) over the interval [0,0,0.2] using ten subintervals and the step size h = 0.02. 
For the first point we have 0 = 0.02 and the intermediate calculations required to 



compute jcj and yi are 

fi = /(0.00,6.0,4.0) = 14.0 gj = g(0.00,6.0,4,0) =* 26.0 

*0 4 ^/] - 6.14 yo + ^gi = 4.26 

f 2 = /(0.01 T 6.14,4,26) = 14.66 g 2 = S (0.01, 6.14,4,26) = 26.94 
*0 + ^/2 = 6,1466 yo + | g 2 = 4.2694 

/■j = /(0.01, 6.1466,4.2694) =* 14,6854 
g3 = /(0.01, 6.1466, 4.2694) = 26.9786 
jco + hfy = 6.293708 y 0 4- fig 3 = 4.539572 

/ 4 = /(0.02,6.293708,4.539572)= 15.372852 
£4 = /(0.02, 6.293708,4.539572) = 27.960268 

These values are used in the final computation: 

xi = 6 4- ^(14.0 4 2(14.66) 4 2(14.6854) + 15.372852) = 6.29354551, 
6 

0.02 

yi = 4 4 -—(26.0 4- 2(26.94) 4- 2(26.9786) 4- 27.960268) = 4.53932490. 
6 


The calculations are summarized in Table 9.13. 


B 
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The numerical solutions contain a certain amount of error at each step. For the 
example above, the error grows, and at the right end point f = 0.2 it reaches its maxi¬ 
mum: 


*(0.2) - *io = 10.5396252 - 10.5396230 = 0.0000022, 
y(0.2) - y 10 = U.7157841 - 11.7157807 = 0.0000034. 


Higher-order Differential Equations 

Higher-order differential equations involve the higher derivatives *'"(/), and so 
on. They arise in mathematical models for problems in physics and engineering. For 
example, 


mx >r {t) 4- cx '( t ) + kx(t ) - g(t) 

represents a mechanical system in which a spring with spring constant k restores a 
displaced mass m. Damping is assumed to be proportional to the velocity, and the 
function gU) is an external force. It is often the case that the position * (fa) and velocity 
*'(fa) are known at a certain time fa. 

By solving for the second derivative, we can write a second-order initial value 
problem in the form 

(9) *"(0 = fit, x(t), x’{t)) with *(fa) =: *0 and xTfa) = yo. 

The second-order differential equation can be reformulated as a system of two first- 
order equations if we use the substitution 


( 10 ) 


x'(t) = y(0- 


Then *"(r) = y'(r) and the differential 


( 11 ) 


dx 

Tt =y 

dy 


equation in (9) becomes a system: 


with 


*(fo) = *0, 
y('o) - yo- 


A numerical procedure such as the Runge-Kutta method can be used to solve (11) 
and will generate two sequences {**] and {y*}. The first sequence is the numerical 
solution to (9). The next example can be interpreted as damped harmonic motion. 


Table 9.14 Runge-Kutta Solution to *"{/) + 4*'(/) + 5x(t) = 0 with 
the Initial Conditions x(0) — 3 and a-V 0 ) = —5 


k 

*k 

x k 

*(h) 

0 

0.0 

3.00000000 

3.00000000 

1 

0.1 

2.52564583 

2.52565822 

2 

0.2 

2.10402783 

2.10404686 

3 

0.3 

1.73506269 

1.73508427 

4 

0.4 

1.41653369 

1.41655509 

5 

0.5 

1.14488509 

1.14490455 

10 

1.0 

0.33324302 

0.33324661 

to 

2.0 

—0330620684 

-0.00621162 

30 

3.0' 

-0.130701079 

-0.00701204 

4 10 

4.0 

—0330091163 

-0.00091170 

48 

4.8 

-03X3004972 

-0.00004969 

49 

4.9 

-0.00002348 

-0.00002345 

30 

| 5.0 

| -03X3000493 

-0.00000450 


Example 9.16. Consider the second-order initial value problem 

x"(r) + 4*'(f) + 5*(f) = 0 with x(0) = 3 and *'(0) = -5. 

(a) Write down the equivalent system of two first-order equations. 

(b) Use the Runge-Kutta method to solve the reformulated problem over [0, 5J umiii: 
M =■- 50 subinterval:; of width h = 0.1. 

(c) Compare the numerical solution with the true solution: 

*(f) — 3e -2t cos(f) + e'*** sin(r). 

The differential equation has the form 

x"(t> = /(f,x(f),*'<()) =* -4x'(f) - 5*(r). 


{ sing the substitution in (10), we get the reformulated problem; 


dx 

T,= y 

dy . . 

-=-5x- 4 . 


with 


x(0) = 3, 

y(0) = —5. 


Samples of the numerical computations are given in Table 9,14. The values are ex¬ 
traneous and are not included. Instead, the true solution values (x(fjt)} are included for 
comparison. » 
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Exercises for Systems of Differential Equations 


In Exercises 1 through 4, use h = 0.05 and 

(a) Euler’s method (7) by hand to find (jcj , yj) and (x 2 , yi). 

(b) the Runge-Kutta method (8) by hand to find (jt] f yi). 

1,. Solve the system x' = 2x+3y,y' = 2x+y with the initial condition x(0> == —2.7 and 
v (0) = 2.8 over the interval 0 < t < LOusing the step size: h = 0.05. The polygonal 
path formed by the soiution set is given in Figure 9.14 and can be compared with the 
analytic solution: 


69 

"25* 


50 


69 I 

and y(r) = —e~' + - V 


2. Solve the system jc' = 3x - y, y* = Ax - y with the initial condition jt(0) = 0.2 and 
y(0) =0.5 over the interval 0 < t < 2 using the step size h = 0.05. The polygonal 
path formed by the solution set is given in Figure 9.15 and can be compared with the 
analytic solution: 


~ Y^te‘ and y(r) = i*' - ^te r . 

3. Solve the system x' = x — 4y, y / = x + y with the initial condition x(0) = 2 and 
>■(0) = 3 over the interval 0 < t < 2 using the step size h = 0.05. The polygonal 
path formed by the solution set is given in Figure 9.16 and can be compared with the 
analytic solution: 


x{t) = —2e‘ + 4e { cos 2 (/) - 12e ; cos(f) sin(r) 

and 

y(0 = —3e c + 6e r cos 2 (r) + 2e ! cos(t) sin(f). 




Figure 9.14 The solution to the sys¬ 
tem x* = 2x +3 y and y' = 2x + y over 
[0.0, 1.0]. 


Figure 9.15 The solution to the sy 
tern x' = 3x — y and y' = 4x - y over 
[0.0, 2.0], 
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4. Solve the system x f = y - Ax, y' = x + y with the initial condition ,r(0) = 1 and 
y(0) = 1 over the interval 0 < t < 1.2 using the step size h = 0.05. The polygonal 
path formed by the solution set is given in Figure 9.17 and can be compared with the 
analytic solution: 


$ 2 g-iMt}! + e s/29f/2 

X(r) =- _ ^ -+ ‘ 


2V29e 3 ^ 2 


2eW 


and 


_n W29f/2 , -j e ^i!2 e ~SWijl , e -Mtfl 

yit) -- : , ^ .... -- + ■ 


2v / 29^ 3,/2 


2^3f/2 


In Exercises 5 tlirough 8: 

(a) Verify that the function x (t) is the solution. 

(b) Reformulate the second-order differential equation as a system of two first-order 
equations. 

(c) Use h = 0.1 and Euler’s method by hand to find xj and x 2 - 

(d) Use h = 0.05 and the Runge-Kutta method by hand to find x\, 

5. 2x"(t) — 5x f (t) - 3jc(f) = 45e 2 ' with x(0) = 2 and *'(0) = 1 
x(r) = Ae ~ fy ' 2 + le 3t — 9e 2t 

6. x"(t) + 6x'(r) -1- 9x(r) = 0 with x(0) = 4 andx'(O) = -4 
x(t) = Ae~ 3t + 8 te~ 3i 

7. x’\t ) + x(t ) = 6cos(f) with x(0) = 2 andx'(0) = 3 
x(r) = 2cos(f) + 3 sin(r) + 3r sin(r) 

8. je"(r) 4- 3x'(f) = 12 with x (0) = 5 and jr'(0) = 1 
x(t) = 4 +At + e -3r 


y 



Figure 9.16 irhe solution to the sys¬ 
tem x' *= X - Ay and y f = x + y over 
[ 0 . 0 , 2 . 0 ]. 


y 



Figure 9.17 The solution to the sys¬ 
tem x f = y — Ax and / == x + y over 
[ 0 . 0 , 1 . 2 ]. 
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Algorithms and Programs 


31. Write a program to solve a system of equations by the Runge -Kutta method of order 
N = 4 (8). 

In Problems 2 through 5, use your computer implementation of the Runge-Kutta method 
for systems to solve each system using the step size h = 0.05. Plot your approximation 
and the analytic solution on the same coordinate system. 

2. x ! — 2jt 4- 3 y, y' — 2x + y, with jc(Q) - -2.7, y(0) =■ 2.8 over0 < ? < 1.0 

*(r) = and y(f) = ^ e 41 

3. x' = 3x- y, y f = 4x- y, with jr(0) = 0.2, y(0) = 0.5 overO < t < 2 
x(t ) = ye' -- jftte* and y(f) = \e* - %te* 

4. x' = x — 4y, y 1 = x + y-, with x(0) = 2, y(0) = 3 overO <t< 2 
Jt(f) = — 2e l 4- 4e f cos 2 (f) — 12e* cos(f) stn(r) 

y(f) = —3e* 4- 6e‘ cos 2 (f) + 2e‘ cas(f) sinfr) 

5. x* = y - 4x, / = x + y, with *(0) = 1, y(0) = 1 over 0 < t < 1.2 

3 € -v^9(/2_ 3 ^V59f/2 e -V29//2 + e V29f/2 

X( ° = 2V33e 3 " 2 + SS72 

_7 ff -V29i/2 + 7 e >/29i/2 g —V29//2 + e V29r/2 

y{l) = UWeW + 2e^P- 

In Problems 6 through 9: 

(a) Reformulate the second-order differential equation as a system of two first-order 
equations. 

(b) Use your computer implementation of the Runge-Kutta method for systems to solve 
each system over the interval [0,2] with the step size h = 0.05. 

(c) Plot your approximation and the analytic solution on the same coordinate system, 

6. 2 x"(t) - 5*'(f) - 3 jt(0 = 4 5e 2t with x(Q) = 2 and*'(0) = 1 
*(f) = 4e“ f/2 + 7e 3 '-9e> 2f 

7. x”(t) + 6j c'(i ) + 9*(f) = 0 with x (0) = 4 and j'(0) = -4 

*(/) + 8(e- 3 ' 

8. x !> (t) 4- t(/) = 6cos(r) with jt( 0) = 2 and .t'(0) = 3 
*(/) = 2cos(/) + 3sin(r) + 3f sin(r) 

9. x "(/) + 3x f (t) = 12 with x(0) = 5 and *'(0) = 1 
jr{/) = 4 + 4r 4- e~ 3 ’ 

In Problems 10 through 19, use your computer implementation of the Runge-Kutta method 
of order N — 4 to solve the given differential equation or system of equations. Plot each 
approximation. 

10. A certain resonant spring system with a periodic forcing function is modeled by 
x"(t) + 25*(f) = 8sin(5f) with jc( 0) = 0 and x'(Q) = 0. 
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Use the Kunge-Kutta method to soive the differential equation over the interval [0, 2] 
using Atf = 40 steps and h = 0.05. 

11. The mathematical model of a certain RLC electrical circuit is 

Q"(t) +20Q\t) 4 1250(f) = 9sin(5f) 

with Q(0) = 0 and Q\ 0) = 0. Use the Runge-Kutta method to solve the differential 
equation over the interval [0, 2] using M = 40 steps and h. = 0.05. Remark. /(/ ) — 
Q’(i ) is the current at time f. 

12. At time t, a pendulum makes an angle x(t) with the vertical axis. Assuming that there 
is no friction, the equation of motion is 

mlx”(t) = —mg sin(x(0). 

where m is the mass and / is the length of the string. Use the Runge-Kutta method 
to solve the differential equation over the interval [0,2] using M — 40 steps and 
h = 0.05 if g — 32 fit/sec 2 and 
(a) l = 3.2 ft and x(0) = 0.3 and *'(0) = 0. 

<fc) l = 0.8 ft and jc( 0) = 0.3 and jk'(0) = 0. 

13. Predator-prey model. An example of a system of nonlinear differential equations 
is the predator-prey problem. Let x(t) and y(r) denote the population of rabbits and 
foxes, respectively, at time t. The predator-prey model asserts that x(r) and y d ) 
satisfy 

jc'(r) = Ar(f) - Bx(t)y(t), 
y'(T) = Cx(t)y{t) - Dy(t). 

A typical computer simulation might use the coefficients 

A =2, B = 0.02, C = 0.0002, D = 0.8. 

Use the Runge-Kutta method to solve the system of differential equations over the 
interval [0,5j using M = 50 steps and h = 0.2 if 

(a) jc (0) = 3000 rabbits and v(0) = 120 foxes. 

(b) jc( 0) = 5000 rabbits and y(0) = 100 foxes. 

14. Solve x f = x - jry, y' = ->■ + .ty with jc( 0) = 4 and y(0) = 1 over [0,8] using 
h — 0.1. The trajectories of this system form closed paths. The polygonal path 
formed by the solution set is one of the curves shown in Figure 9.18, 

15. Solve x' = -3* -2 y- 2xy 2 , y f = 2x - y + 2y 3 with x (0) = 0.8 and y(0) = 0.6 
over [0,4] using h =0.1. For this system, the origin is classified as a spiral point that 
is asymptotically stable. The polygonal path formed by the solution set is one of the 
curves shown in Figure 9,19. 

16. Solve x' = y 2 - x 1 , y ' = 2 xy with x(0) = 2.0 and y(0) = 0.1 over [0.0, 1.5] 
using h = 0.05. For this system, there is an unstable saddle point at the origin. The 
polygonal path formed by the solution set is one of the curves shown in Figure 9.20. 
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Figure 9.20 Solutions to the system Figure 9.21 Solutions to the system 

x’ = y 2 - x 2 and y' = 2xy . x' = I - y and y' = x 2 - y 2 . 


17. Solve x' — t - y, y = x 1 - y 2 with Jt(0) = -1.2 and y(0) = 0.0 over [0, 5] using 
h =0.1, The point (1, 1) is a spiral point that is asymptotically stable, and title point 
(— 1. 1) is an unstable saddle point. The polygonal path formed by the solution set is 
one of the curves shown in Figure 9.21. 

18. Solve x - x 3 - Ixy 2 , y f — 2x 2 y - y 3 with *(0) = 1.0 and y(0) = 0.2 over 
[0, 2] using h — 0.025. This system has an unstable critical point at the origin. The 
polygonal path formed by the solution set is one of the curves shown in Figure 9.22. 

19. Solve x r = x 2 - y 2 , / = 2 xy with jc( 0) = 2.0 and y(0) = 0.6 over [0.0, 1.6] using 
h = 0.02. The origin is an unstable critical point. The polygonal path formed by the 
solution set is one of the curves shown in Figure 9.23. 


mini 
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Figure 9.22 Solutions to the system Figure 9.23 Solutions to tine system 

x’ — .t 3 - 2*y 2 and y’ = 2x 2 y - y 3 . x 1 — x 2 - y 2 and / = 2xy. 

. 8 Boundary Value Problems 

Another type of differential equation has the form 

(1) x" = /(f, x , x') for a<t<b, 
with the boundary conditions 

(2) x(a ) — a and x(b) = ft. 

This is called a boundary value problem. 

The conditions that guarantee that a solution to (1) exists should be checked be¬ 
fore any numerical scheme is applied; otherwise, a list of meaningless output may be 
generated. The general conditions are stated in the following theorem. 

Theorem 9.8 (Boundary Value Problem). Assume that /</, x, y ) is continuous on 
the region R = {(>, x, y) : a < t < b, —oo < x < oo, — oo < y < oo} and that 
df/dx = f x (t, x, y ) and df/dy = f y {t, x, y) are continuous on R, If there exists a 
constant M > 0 for which f x and f y satisfy 

(3) /*(/, x, y) > 0 for all (/, x, y) € R and 

(4) \f y (t, x t y)[ < M for all (/, y) e R, 

then the boundary value problem 

(5) x H = /(#, X, x') with x(a) = a and x(b) = fl 
has a unique solution x = x(t) for a < t <b. 
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The notation y — x'(t) has been used to distinguish the third variable of the func 
tion fit, x, x). Finally, the special case of linear differential equations is worthy o. 
mention. 


Corollary 9.1 (Linear Boundary Value Problem). Assume that / in Theorem 9 8 
has the form f(t,x,y) = p(t)y + q(t)x -f r(t ) and that / and its partial derivatives 
Bf/dx = q{t) and df/dy = p{t ) are continuous on R. If there exists a constant M > 0 
for which pit) and q(t) satisfy 

(6) q(t) > 0 for all t € [a, b] t and 

(?) I piO\<M = max{fp(0|}, 

a<t<b 

then the linear boundary value problem 

(8) x" = p(t)x'(t) + q (t)x(t) 4- r(t) with x(a) = a and x(b) - 
has a unique solution x = x(t) over a < t < b. 


Reduction to Two I.V.P/s: Linear Shooting Method 

Finding the solution of a linear boundary problem is assisted by the linear structure of 
the equation arid the use of two special initial value problems. Suppose that u(t ) is the 
unique solution to the I.V.P. 


(9) 


w" = p(t)u f (t) + q(t)u(t) + r(t) with u(a) — a and u\a) = 0. 
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Furthermore, suppose that u(f) is the unique solution to the LV.R 
(10) v" = pit)v f (t) + q(t)v(t) with u(a)=0 and v\a) =-. 1. 

Then the linear combination 

rit) = «(/) + Cu(0 

is a solution to *" = p(t)x ! {t) + q(t)x(t) + r(t) as seeri by the computation 

x " = u ” + Cv " = P(0u'(t) + q(t)u{t ) + r{t) + p{t)Cv\t) + q{t)Cv{t) 

- p(t)(u'(t) + Ci/(f)) + q(t)(u(t) + Cu(/)) + r(f) 

= P(Ox'it) + q(t)x(t) + r(/). 

The solution x(t ) in equation (11) takes on the boundary values 

x(a) = u(a)Cv(a) = & + Q ■= &, 
x(b) = u(b) + Cv(b). 

Imposing the boundary condition *(/>) = £ in (12) produces C = (fi ~ u{b))fv{b]. 
Therefore, if v(b) ^ 0, the unique solution to (8) is 

( 13 J X(t) =u(t)+ ^ 

v(b) 

Remark. If q fulfills the hypotheses of Corollary 9.1, this rules out the troublesome 
solution u(f) = 0, so that (13) is the form of the required solution. The details are let! 
for the reader to investigate in the exercises. 


Example 9.17. Solve the boundary value problem 




with jc(O) = 1.25 and x(4) = -0.95 over the interval [0,4]. 

The functions p, q, and r are pit) = 2t/(l + t 2 ), q(t) = -2/(1 + t 2 ), and 
r(t) = 1, respectively. The Runge-Kutta method of order 4 with step size h = 0.2 
is used to construct numerical solutions {«,} and {u,} to equations (9) and (10), respec¬ 
tively. The approximations {uj} for«(/) are given in the first column of Table 9.15. Then 
w(4) » U 20 ~ —2.893535 and u(4) ^ u 2 o = 4 are used with (13) to construct 



b — u( 4) 

-WT V I ’ 0 -« 5 * 84 ^- 
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Table 9.15 The Approximate Solutions [xj } = 
2: 2 

die Equation x"(i) = 7 — =*'(/’> - , - ± + 1 

1 + 1 + 1* 

[Uf + Wj} to 

'/ 

u j 

Wj 

Xj = Uj + Wj 

0.0 

1.250000 

0.000000 

1.250000 

0.2 

1.220131 

0.097177 

1.317308 

0.4 

1.132073 

0.194353 

1.326426 

0.6 

0.990122 

0.291530 

1.281652 

0.8 

0.800569 

0.388707 

1.189276 

1.0 

0.570844 

0.485884 

1.056728 

1.2 

0.308850 

0.583061 

0.891911 

1.4 

0.022522 

0.680237 

0,702759 

1.6 

-0.280424 

0.777413 

0.496989 

1.8 

-0.592609 

0.874591 

0.281982 

2.0 

-0.907039 

0.971767 

0.064728 

2.2 

-1.217121 

1.068944 

-0.148177 

2.4 

-1.516639 

1.166121 

-0.350518 

2.6 

-1.799740 

1.263297 

-0.536443 

2.8 

-2.060904 

1.360474 

-0.700430 

3.0 

-2.294916 

1.457651 

-0.837265 

3.2 

-2.496842 

1.554828 

-0.942014 

3.4 

-2.662004 

1.652004 

- 1,010000 

3.6 

-2.78.5960 

1.749181 

-1.036779 

3.8 

-2.864481 

1.846358 

-1.018123 

4.0 

-2.893535 

1.'943535 | 

-0.950000 


Then the required approximate solution is {xj} ~ { uj 4- wj \. Sample computations arc 
given in Table 9.15, and Figure 9.24 shows their graphs. The reader can verify that v{i > = ; 
is the analytic solution for boundary value problem (10); that is. 


v"(0 


21 

IT? 


v'(t) 


2 




1 +r 


with the initial conditions v(0) = 0 and i/(0) = L 

The approximations in Table 9.16 compare numerical solutions obtained with the linear 
shooting method with the step sizes h = 0.2 and h — 0.1 and the analytic solution 


x(f) = !.25 + 0.4860896526/ - 2.25/ 2 + 2i arctan(r) - l - ln(l + t 1 ) + ln(l + f 2 ). 

A graph of the approximate solution when k = 0.2 is given in Figure 9.25. Included in 
the table are columns for the error. Since the Runge-Kutta solutions have error of order 
0(h 4 ), the error in the solution with the smaller step size h = 0.1 is about ~ the error of 
the solution with the large step size k = 0.2. ■ 


Program 9.10 will call Program 9.9 to solve the initial value problems (9) and (10). 
Program 9.9 approximates solutions of systems of differential equations using a mod¬ 
ification of the Runge-Kutta method of order IV = 4. Thus, it is necessary to save 
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2t 2 


Table 9.16 Numerical Approximations for 

-1 - - - 1 -■ p 'n- 

x"{i) 


1 +,*'<') + I 


x j 

x(tj) 



X J 

X(tj) 

X(tj) — Xj 

*j 

A = 0.2 

exact 

error 

*J 

h =0.1 

exact 

error 

0.0 

1.250000 

1.250000 

0.000000 

0.0 

1.250000 

1.250000 

0.000000 





0.1 

1.291116 

1.291117 

0.000001 

0.2 

1.317308 

1.317350 

0.000042 

0.2 

1.317348 

1.317350 

0.000002 





0.3 

1.328986 

1.328990 

0.000004 

0,4 

1.326426 

1.326505 

0,000079 

0.4 

1.326500 

1.326505 

0.000005 





0.5 

1.310508 

1.310514 

0.000006 

0,6 

1.281652 

1.281762 

0.000110 

0.6 

1.281756 

1.281762 

0.000006 

0.8 

1.189276 

1.189412 

0,000136 

0.8 

1.189404 

1.189412 

0.000008 

1.0 

1.056728 

1.056886 

! 0,000158 

1.0 

1,056876 

1.056886 

0.000010 

1.2 

0.891911 

0.892086 

0.000175 

1.2 

0.892076 

0.892086 

0.000010 

1.6 

0.496989 

0.497187 

0.000198 

1.6 

0.497175 

0.497187 

0.000012 

2.0 

0.064728 

0.064931 

0.000203 

2.0 

0.064919 

0.064931 

0.000012 

2.4 

-0.350518 

-0.350325 

0.000193 

2.4 

-0.350337 

-0.35032.5 

0.000012 

2.8 

-0.700430 

-0.700262 

0.000168 

2.8 

-0.700273 

-0.700262 

0.000011 

3.2 

-0.942014 

-0.941888 

0.000126 

3.2 

-0.941895 

-0.941888 

0.000007 

3.6 

-1.036779 

-1.036708 

0.000071 

3.6 

-1,036713 1 

-1.036708 

0.000005 

4.0 

-0.950000 

-0.950000 

0,000000 

4.0 

-0.950000 | 

—0.9500013 

0.000000 


the equations (9) and (10) in the form of the system of equations (11) of Section 9.7. 
As an illustration, consider the boundary value problem in Example 9.17. The follow- 
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ing M-file, named FI, will save the I.V.K (9) in the form of a system of differential 
equations. 

function Z^FKt.Z) 
x=Z(l) ;y=Z(:2) ; 

2 = [y,2*t*y/(l+t''2)-2*x/Cl+t~2)+l] ; 

A similar M-file, named F2, will save the l.V.R (10) Oust let r{t) = 0 in FI) in the 
appropriate form, 

A plot of the approximation obtained from Program 9.10 can be constructed by 
using the command plot (L(: ,1), L(:,2)). 


Program 9.9 (Runge-Kutta Method of Order N = 4 for Systems). To approxi¬ 
mate the solution of the system of differential equations 

x\(t) = -*«</)) 

[ with x\ (a) — o f],..., x n (a) = a„ over the interval [a, b], 
function [T, Z] =rks4 (F,a,b,Za,M) " 

Xlnput - F :is the system input as a string *F’ 

*/. - a and b are the end points of the interval 

^ “ Za-[x(a) y(a)] are the initial conditions 

l - M is the number of steps 
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Xuutput - T is the vector of steps 

L - Z=[xl(t) , . ,xn(t)]; vhere xk(t) is the approximation 

% to the kth dependent variable 

h=(b-a)/M; 

T-zeros(l,M+l); 

Z=zeros (M+;t, length(Za)); 

T®a:h:b; 

Z(i,:)~Za; 

for j=l:H 

kl=h*feval(F f T(j) J Z(j J :)); 
k2-h*faval(F,T(j)+h/2,Z(j,:)+kl/2); 
k3=h*feval(F,T(j)+h/2, Z C j ,:)+k2/2); 
k4=h*feval(F, T (j ) +h, Z (j,:)+k3); 

ZCj+1,;)+(kl+2*k2+2*k3+k4)/6; 

end 

Program 9.10 (Linear Shooting Method). To approximate the solution of the 
boundary value problem x" = p(t)x'(t) + q(t)x(t) + r(t ) with x(a) = a and 
x(b) = p over the interval [a, b ] by using the Runge-Kutta method of order N = 4. 

function L=linsht(FI,F2,a,b,alpha,beta,M) 

/Jnput - FI and F2 are the systems of first-order equations 
7 representing the I.V.P.’s (9) and (10), respectively; 

7 input as strings ’FI 1 , J F2’ 

7 - a and b are the end points of the interval 

7 - alpha = x(a) and beta = x(b) : boundary conditions 

7 - M is the number of steps 

^Output - L =[T’ X]; where I’ is the (M+l)xl vector of 
7 abscissas and X is the (M+l)xl vector of ordinates 

7Solve the system FI 
Za=[alpha,0]; 

[T,Z]=rks4(Fl,a,b,Za,M); 

U=Z (: , 1) ; 

7Solve the system F2 
Za- [0,1] ; 

[T,Z]=rks4(F2,a,b,Za,M); 

V=2(:,1); 

/^Calculate the solution to the boundary value problem 
X=U+ (beta-U(M+l))*V/V(M+l); 

L»[T’ X]; 
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E xercises for Boundary Value Problems 


1* Verify that the function jc (if) is the solution to the boundary value problem, 

(a) x" = (-2/t)x' + (2 /t 2 )x + (10cos(ln(f»)/f 2 over [1, 3] with *<1) = 1 and 
*(3> = -l. 

/ \ _ 4^335950(589 —0.3359506908/ 3 - 3 1 2 cos(ln(f)) + 1 2 sin(ln(r)) 


(b) - 2x + e~ l + sin(2f) over [0, 4} with .v ( 0 ) ^ 0.6 and jc(4) = - 0 . 1 . 

11 2 
x(t) = g + e~ r — cos(r) — - cos 2 (f) 

+ 3.670227413e~ r sin(f) — ~ cos(/) sin(/) 

(c) x" = —Ax 1 — 4x + 5 cos{4/) 4- sin(2f) over [0, 2] with *(0) = 0.75 and x(2) = 
0.25. 

x(t ) = ~ + 1.025e- 2f - l.9\5729975te~ 2t + ^cos 2 (r) 

- | cos 4 (f) - | cos(r) sin(/) + ? cos 3 (f) sin(f) 

(d) x" + {1 /t)x' + (1 — :i/(4r 2 ))jr = 0 over [1, 6 ] with jr(l) = 1 andx( 6 ) = 0. 

x{i) = 0.2913843206 eos{/) + 1.001299385 sin(t) 

Vi 

(e) x" - (1 //)*' + (l/f 2 )jc = 1 over [0.5,4.5] with r(0.5) = 1 and jc(4.5) = 2. 

*(/) = t 2 - 0,252582(5491/ - 2.528442297/ln(r) 

2. Does the boundary value problem in Exercise 1(e) satisfy the hypotheses of Corol¬ 
lary 9.1? Explain. 

3. If q fulfills die hypothesis of Corollary 9.1, show that v(t) = 0 is the unique solution 
to the boundary value problem 

v" = p(t)u'(t) + q(t)v(t) with v(a) = 0 and v(b ) = 0 . 


Algorithms and Programs 


1. (a) Use Programs 9.9 and 9.10 to solve each of the boundary value problems in 
Exercise 1, using the step size k = 0.05. 

(b) Graph your solution and the actual solution on the same coordinate system. 
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2. Construct programs analogous to Program 9.9 based on 

(a) Heim's method, 

(b) the Adams-Bashforth-Moulton method, and 

(c) Hamming’s method. 

3. (a) Modify Program 9.10 to call each of your programs from Problem 2. 

(b) Use your programs to solve each of the five boundary value problems in Exer¬ 
cise l using the step size h = 0,05, 

(c) Graph your solutions and the actual solution on the same coordinate system. 


9,9 Finite-difference Method 


Methods involving difference quotient approximations for derivatives can be used for 
solving certain second-order boundary value problems. Consider the linear equation 

x” = p(t)x\t) + q(t)x(t) + r(t) 

over [a, b] with x(a ) = a and x(b) = ft. Form a partition of [a, b ] using the points 
a = /n < /! < • • • < t N = h, where h = (b - a)/N and tj *= a + jh for j = 0, 1, 
The central-difference formulas discussed in Chapter 6 are used to approximate 
the derivatives 


x'itj) = + 00,2) 


x”(r,) = - 




4- 0(h 2 ). 


To start the derivation, we replace each term x(tj) on the right side of (2) and (3) 
with xj and the resulting equations are substituted into (I) to obtain the relation 

( 4) — + ™ 2 > - + 0 <**>) 

+ q{tj)xj +r(tj). 

Next, we drop the two terms 0(h 2 ) in (4) and introduce the notation p f = p(tj), 
° = *7( r /)’ an d r j ^ this produces the difference equation 


~2xj + jC j_] 


= Pj — 


x j+\ ~~ x j—\ 


+ QjXj + rj, 


which is used to compute numerical approximations to the differential equation (1). 
Tlris is carried out by multiplying each side of (5) by h 2 and then collecting terms 
involving Xj-!, xj, and xj+\ and arranging them in a system of linear equations: 


(- y p J ~ y + (2 + h 2 qj)xj + (^ PJ - 1 = -h 2 r } 
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for j = 1,2,..., Af — l, where .*o = a and x# = 0. The system in ( 6 ) has the familiar 
tridiagonal form, which is more visible when displayed with matrix notation: 


\p\ ~ 1 


2 + h 2 q\ 

~rP 2 - 1 2 + h 2 q 2 

: TPJ “ 1 
O 


3P2-1 
2 4- ft 2 qj 

=TPN-2 ~ 1 


%Pi ~ 1 

2 -I- k 2 q N - 2 


yP^i - 1 2 r + h 2 qN-i 


O 


\PN-1 


X\ 

*2 

Xj 

XN-2 
XN~ 1 


-h 2 r\+e o 
-h 2 r 2 
-h 2 rj 
-h 2 r N _ 2 
-*i 2 nv-i H-ca- 


where 


e ° = (^ Pl+1 ) a ^ = + 

When compulations with step size h are used, the numerical approximation to the 
solution is a set of discrete points {(fy, xy)}; if the analytic solution xitp is known, we 
can compare xy and x(/y). 

Example 9.18. Solve the boundary value problem 

X " W = TTp jr ' (,) -TTT Jt(,)+l 

with x (0) = 1.25 and x (4) = -0.95 over the interval [0,4]. 

The functions p , q, and r are p(t) = 2f/(I + t 2 ), q(t) = -2/(1 + t 2 ), and r(t) = 1 . 
respectively. The finite-difference method is used to construct numerical solutions {x,} us¬ 
ing the system of equations ( 6 ). Sample values of the approximations {xyj}, {xy, 2 }, {x ; 3 }. 
and {xy. 4 } corresponding to the step sizes h, = 0 . 2 , h 2 = 0 . 1 , b .3 = 0.05, and A 4 = 0,025 
are given in Table 9.17, Figure 9.26 shows the graph of the polygonal path formed from 
{( tj , xy, 1 )} for the case h] — 0.2. There are 41 terms in the sequence generated with 
** 2 — 0 . 1 , and the sequence {xy, 2 } only includes every other term from these computations; 
they correspond to (he 21 values of fry) given in Table 9.17. Similarly, the sequences \ Xj 3 } 
and [Xja] are a portion of the values generated with step sizes h 3 = 0.05 and h .4 = 0.025. 
respectively, and they correspond to the 21 values of {/y} in Table 9.17. 

Next we compare numerical solutions in Table 9.17 with the analytic solution: x(r) = 

1.25 4-0.486089652r - 2.25t 2 4- 2f arctan(t) - \ ln( 1 4- r 2 ) 4-1 1 1 ln(l + r 2 ). The numerical 


Table 9.17 Nmnerical Approximations for x"(r) = 1 ^ 2 x'(r) - - 2 2 x(f) + 1 


' i 

x Ll 

h = 0.2 

X J .2 
h =,0.1 

*/.3 

k = 0.05 

*y,4 

h = 0.025 

x(fy) 

exact 

0 0 

1.250000 

1.250000 

1.250000 

1.250000 

1.250000 

0.2 

1.314503 

1.316646 

1.317174 

1.317306 

1.317350 

04 

1.320607 

1.325045 

1 326141 

1.326414 

1.326505 

0.6 

1.272755 

1.279533 

1.281206 

1.281623 

1.281762 

0.8 

1,177399 

1.18.5438 

1.188670 

1.189227 

1.189412 

1.0 

1,042106 

1.053226 

1.055973 

1.056658 

1.056886 

1.2 

0.874878 

0.887823 

0.891023 

0.891821 

0.892086 

1.4 

0.683712 

0.698181 

0.701758 

0.702650 

0.702947 

1.6 

0.476372 

0.492027 

0.495900 

0,496865 

0.497187 

1.8 

0.260264 

0.276749 

0.280828 

0.281846 

0.282184 

2.0 

0.042399 

0.059343 

0.063537 

0.064583 

0,064931 

2.2 

-0,170616 

-0.153592 

-0.149378 

-0.148327 

-0.147977 

2.4 

-0.372557 

-0.355841 

-0.351702 

-0.350669 

0.350325 

2.6 

-0.557565 

-0.541546 

-0.537580 

-0.536590 

-0.536261 

2.8 

-0.720114 

-0.705188 

-0.701492 

-0.700570 

-0.700262 

3.0 

—0.854988 

-0.841551 

-0.838223 

-0.837393 

-0.837116 

3.2 

-0.957250 

-0.945700 

-0.942839 

-0.942125 

-0.941888 

3 4 

-1.022221 

-1.012:958 

-1.010662 

-1.010090 

-1.009899 

3,6 

-1.045457 

-1.038880 

-1.037250 

-1.036844 

-1.036709 

3.8 

-1.022727 

-1.019238 

-1.018373 

-1.018158 

-1.018086 

4.0 

0.950000 

-0.950000 

-0.950000 | 

-0,950000 

-0.950000 


solutions can be shown to have error of order 0 (h 2 ). Hence reducing the step size by a 
factor of j results in the error being reduced by about A careful scrutiny of Table 9 18 
will reveal that this is happening. For instance, at tj = 1.0 the errors incurred with step 
sizeshi^/^and/uareey.i = 0.014780, e hl = 0.003660, ej 3 = 0.000913,and e, 4 = 
0.030228, respectively. Their successive ratios e, ) 2 /ey t i = 0.003660/0.014780 = 0.2476, 
^. 3 /ey ,2 = 0.000913/0,003660== 0.2495, and ey, 4 /ey ,3 = 0.000228/0.000913 = 0.2497 
are approaching 

Finally, we show how Richardson’s improvement scheme can be used to extrapolate 
the seemingly inaccurate sequences {xy.ij, {xy. 2 }, {^, 3 ], and {x,, 4 } and obtain six digits 
of precision. Elimmate the error terms 0{h 2 ) and 0({h/2) 2 ) in the approximations {x, 1 } 

W. 2 } by generating the extrapolated sequence {zy.j} = {(4xy , 2 - x jA )/3}. Similarly, 
the error terms 0(\Jt/2) 2 ) and 0({h/A) 2 ) for {xy i2 ] and {xy^} are eliminated by generat¬ 
ing {Zj, 2 ) = ((4xyj - xy, 2 )/3}. It has been shown that the second level of Richardson’s 
improvement scheme applies to the sequences {zy.j} and {zy, 2 J, s0 the third improvement 
!S {U 6 zy .2 - z J( i)/15} (see Reference [41]). Let us illustrate the situation by finding the 
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Figure 9.26 The graph of tire numerical approximation for 
jc(r> = u{r) + tu(f) which is the solution to 


~ t + t 2 x W + 1 


{using h =t 0.2) 


extrapolated values that correspond to tj = 1 .0. The first extrapolated value is 

= 4(1.053226)- 1.042106 = , ^ 

3 3 

Tlie second extrapolated value is 


« 4 i I.UD J y^J-I.UD £ ff g = J 0; - 6889 = 

3 3 

Finally, the third extrapolation involves the terms zj, i and zj,2- 


16zj. 2 -zj.i 16(1.056889) - 1.056932 


= 1.056886. 


This last computation contains six decimal places of accuracy. The values at the other 
points are given in Table 9.19. ■ 

Program 9.12 will call Program 9.11 to solve the tridiagonal system (6). Pro¬ 
gram 9.12 requires that the coefficient functions p(i), q(t), and r(f) (boundary value 
problem (1)) be saved in M-files p. m, q. m, and r. m, respectively. 

Program 9.11 (Tridiagonal Systems). To solve the tridiagonal system CX = B, 
where C is a tridiagonal matrix. 

function X=trisys(A,D,CjB) 
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Table 9.18 Errors in Numerical Approximations Using the Finite-difference Method 




x(tj)-Xj t2 



X(tf) -Xj i 



0 

= e u\ 

= *7,2 



== *7.3 


= *7,4 


h x = 0.2 

hi =0.1 



h 3 =0.05 


h A = 0.025 

0.0 

0.000000 

0.000000 



0.000000 


0.000000 

0.2 

0.002847 

0 . 0007 m 



0.(300176 


0.000044 

0.4 

0.005898 

0.001460 



0.000364 


0.000031 

0.6 

0.009007 

0.002229 



0.000556 


0.000139 

0.8 

0.012013 

0.002974 



0.000742 


0.000185 

1.0 

0.014780 

0.003660 



0.000913 


0.000228 

1.2 

0.017208 

0.004263 



0.001063 


0.000265 

1.4 

0.019235 

0.004766 



0.001189 


0.000297 

1.6 

0.020815 

0.005160 



0.001287 


0.000322 

1.8 

0.021920 

0.005435 



0-001356 


0.000338 

2.0 

0.022533 

0.005588 



0,001394 


0.000348 

2.2 

0.022639 

0.005615 



0.001401 


0.000350 

2,4 

0.022232 

0.005516 



0.001377 


0.000344 

2.6 

0.021304 

0.005285 



0.C01319 


0.000329 

2.8 

0.019852 

0.004926 



0.001230 


0.000308 

3.0 

0.017872 

0.004435 



0.001107 


0,000277 

3.2 

0,015362 

0.003812 



0.000951 


0.000237 

3.4 

0.012322 

0.003059 



0,000763 


0.000191 

3.6 

0.008749 

0.002171 



0.000541 


0.000135 

3.8 

0.004641 

0.001152 



0.000287 


0.000072 

4.0 

0.000000 

0.000000 

— 

— 

0.000000 

J— 

0.000000 

yilnput 

- A is the 

subdiagonal of the 

coefficient matrix 


y. 

- D is the 

main diagonal of 

the 

coefficient 

matrix 

1 

- C is the 

superdiagonal of 

the 

coefficient 

matrix 

7, 

- B is the 

constant vector 

of 

the linear system 


^Output 

- X is the 

solution vector 







N=length(B); 
for k=2:N 

mult=A(k-1)/D(k-1); 
D(k)=D(k)-mult*C(k-l); 

B(k)=B(k)-mult*B(k-1); 

end 

X(N)=B(N)/D(N); 
for k= N—1:—1:1 

X(k)=(B(k)~C(k)*X(k+l))/D(k); 

end 
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Table 9.19 Extrapolation of the Numerical Approximations (jc^h Ur. 3} Obtained 

with the Finite-difference Method 



2 — -*/,] 



X(tj) 


3 

3 

3 

Exact 

0 

= ^,1 

= Zj,2 


solution 

0.0 

1.250000 

1.250000 

1.250000 

1.250000 

0.2 

1.317360 

i.317351 

1,317350 

1.317350 

0.4 

1.326524 

1.326506 

1.326504 

1.326505 

0.6 

1.281792 

1.281764 

1,281762 

1.281762 

0.8 

1,189451 

1.189414 

1.189412 

1.189412 

1.0 

1.056932 

1.056889 

1.056886 

1.056886 

1.2 

0.892138 

0.892090 

0.892086 

0.892086 

1.4 

0.703003 

0.702951 

0.702947 

0.702948 

1.6 

0.497246 

0.497191 

0.497187 

0.497187 

1.8 

0.282244 

0.282188 

0.282184 

0.282184 

2.0 

0.064991 

0.064935 

0.064931 

0.064931 

2.2 

-0.147918 

-0.147973 

-0.147977 

-0.147977 

2.4 

-0.350268 

-0.350322 

-0.350325 

-0.350325 

2.6 

-0.536207 

-0.536258 

-0.536261 

-0.536261 

2.8 

—0.700213 

-0.700259 

-0.700263 

-0.700262 

3.0 

-0.837072 

-0.837113 

-0.837116 

-0.837116 

3.2 

-0.941850 

—0,941885 

—0.941888 

—0.941888 

3.4 

-1.009870 

-1.009898 

-1.009899 

— 1.009896 

3.6 

-1.036688 

-1.036707 

-1.036708 

-1.03670s 

3.8 

-1.018075 

-1.018085 

-1.018086 

-L0l808(i 

4.0 

-0.9:50000 

-0.950000 

—0.950000 

-0.950000 


9.12 (Finite-difference Method). To approximate the solution of the 
value problem x /r = p(i) jc'(f) + q(t)x(t) + r(/) with x(a) = a and 
over the interval [a, b ] by using the finite-difference method of order 

The mesh is a = t\ < ■ ■ • < t^+i = b and the solution points arc 
'+i 
= i ■ 

function F=findiff(p,q,r,a,b*alpha,beta,N) 

'/.Input - p,q,and r axe the coefficient functions of (1) 

*1 input as strings; ’p 1 ,’q 1 f ’r 3 

X - a and b are the left and right end points 

*/, - alpha=x(a) and beta=x(b) 

% - N is the number of steps 

•/.Output - F=[T J X'] :where T J is the lxN vector of abscissas 
7. and X 1 is the lxN vector of ordinates 


Program 

boundary 
x{b) = $ 
Oih 1 ). 
Remark, 1 
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7Initialize vectors and h 
T=3eros<l,N+l); 
x=3eros(l,N-l ); 

Va-zerosCl,N-2); 
Vb^zeros(l,N-l); 
Vc=zeros(l,N-2); 
Vd^zeros(l f N-i); 
h= (b-a)/N; 


/auaicuiate the constant vector B in AX“B 
Vt =a+h ; h:a+h*(N-l ); 

Vb=-h~2*feval(r, Vt); 

Vb ( 1) = Vb (1) + (l+h/2*f eval Cp„ Vt (1)) ) *nlpha; 
Vb(N-l)-Vb(N~:l) + (l-h/2*faval Cp, Vt Of-1) ) ) *bet a. ; 
/.Calculate the main diagonal of A in AX*B 
Vd =2+h''2*feval(q,Vt) ; 

'/Calculate the superdiagonal of A in AX =2 
Vta=Vt(l,2:N-:|.); 

Va=-l-h/2*feval(p,Vta); 

/Calculate the aubdiagonal of A in AX=B 
Vtc=Vt(1,1:N-2); 

Vc=-i+h/2*fevalCp,Vtc); 

/Solve AX=B using trisys 
X=trisysCVa,Vd,Vc,Vb); 

T=[a,Vt,b]; 


F=[.T } X J ]; 


Exercises for Finite-diffe rence Method _ 

n Exercises 1 through 3, use the finite-difference method to approximate x(a + 0.5). 

(a) Let A i = 0.5 and do one step by hand calculation. Then let = 0.25 and do two 
steps by hand calculation. 

(b) Use extrapolation of the values in part (a) to obtain a better approximation {i.e., 

Zj, 1 = ( 4 Xj,2-X JA )P), 

(c) Compare your results from parts (a) and (b) with the exact value x {a + 0.5). 

1. x n — 2x' — jc 4- 1 1 - I over [0, 1] with *{0) = 5 andx(l) == 10 
x(t) = t 2 +4; + 5 

2. jc" + UA)x' + U - l/(4r 2 )U =0 over [1,6] withx(l)= 1 andx(6) = 0 
X (t) = ^‘^913843206 cos (Q + 1.001299385 sin(f) 

V7 
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3. x" - (1 ft)x’ + (1 jt 2 )x = 1 over [0.5,4.5] with *(0.5) = I and*(4.5) = 2 
.*{/) = * 2 - 0.2525826491* - 2.528442297* ln<r) 

4. Assume that p, q , and r are continuous over the interval [a, b\ and that q(t) > 0 lor 
a < i < b. If h satisfies 0 < h < 2/M, where M = nueqj-cf <£{|p(*) |), prove that 
the coefficient matrix of (6) is strictly diagonally dominant find that there is a unique 
solution. 

5. Assume that p(t) = C L > 0 and qO) = C 2 > 0. (a) Write out the tridiagonal linear 
system for this situation, (b) Prove that the tridiagonal system is strictly diagonal!) 
dominant and hence has a unique solution, provided that C\f C 2 < h. 


to Table 9.19. Plot your extrapolated solution and the actual solution on the 
coordinate system. 

(a) x" = 2x' - x -P t 1 - 1 over [0, 1] with rc(0) = 5 and x(l) = 10 
*(*) = * 2 + 4* + 5 

(b) x" + (Ijt)x' + (1 - l/(4* 2 ))* = Oover [1,6] with x(l) = 1 and *(6) = 

= 0.2913843206 cos (f) + 1.001299385 sin(f) 
sft 

(c) x" - (]/0x' + (l/r 2 )* = 1 over [0.5,4.5] with x(0,5) = 1 andx(4.5) = 
x(i) = r 2 - 0.2525826491* - 2.528442297* in(*) 


Algorithms an d Programs _ 

1. Use Programs 9.11 and 9.12 to solve the given boundary problem using step size;; 

/j — o.l and h — 0.01. Plot your two approximate solutions and the actual solution 
on the same coordinate system. 

(a) x 1 ’ = 2x' - x +1 2 - 1 over [0,1] with *(0) = 5 and j;( 1) = 10 
*(r) = t 2 + 4f 4- 5 

(b) + (1/r)*' + (1 - l/(4t 2 ))x = 0 over [1, 6] withx(l) = 1 and x(6) = 0 

0.2913843206 cos (0 + 1.001299385 sin (f) 

* (,) = 7 ? 

(C) x" - CI//)jc" + (l/* 2 )* = 1 over [0.5,4.5] with x(0.5) = 1 and*(4.5) = 2 
at{/) = t 2 - 0.2525826491*- 2.528442297* ln(*) 

In Problems 2 through 7, use Programs 9.11 and 9.12 to solve the given boundary problem 
using step sizes h = 0.2, h = 0.1, and h = 0.05. For each problem, graph the thice 
solutions on the same coordinate system. 

2. x” - i-2/t)x’ + {ljt 2 )x + (10cos(ln(f)))/f 2 over [1,3] withx(l) = 1 and or (3) a: 

-1 

3 , x " = -5x' — 6x + te~ 2t + 3.9cos(3*) over [0, 3] with x{0) — 0.95 and x(3) = 0.15 

4 , x " = —4x f — 4x + 5 cos(4f) -I- sin(2*) over [0,2] with *{0) = 0.75 and x(2) = 0.25 
5 j x » — ~2x' — 2x + e~ f + sin(2t) over [0.4] with x(0) = 0.6 and x(4) = —0.1 

6. x" + (2 /t)x' - (2/* 2 )jc = sin(*)/f 2 over [1,6] with x{l) == -0.02 and x(6) = 0.02 

7. x lf + (l/*)x'“h(l - l/(4f 2 ))jt - VtcosC*) over [1,6] with x(l) = 1.0 and jc(6) 
-0.5 

8. Construct a program that will call Programs 9.11 and 9.12 and carry out the extrapo¬ 
lation process illustrated in Example 9.18 and Table 9.19. 

9. For each of the given boundary value problems, use your program from Problem 8 
and the step sizes h -0.1, h = 0.05, and h = 0.025 to construct a table analogous 


same 

0 

2 




Solution of 

Partial Differential Equations 

Many problems in applied science, physics, and engineering are modeled mathemat¬ 
ically with partial differential equations. A differential equation involving more than 
one independent variable is called a partial differential equation (PDE). It is not nec¬ 
essary to have taken a specialized course in PDEs to understand the rudimentary prin¬ 
ciples involved in obtaining computer solutions. In this chapter we will study finite- 
difference methods which are based on formulas for approximating the first and second 
derivatives of a function. We start by classifying the three types of equations under 
investigation and introduce a physical problem for each case. A partial differential 
equation of the form 

(1) A4> xx + B4> X y 4- Cbyy = f (X , y, <]>, <t\* , <J> V ), 

where A, B, and C are constants, is called quasilinear , There are three types of quasi- 
linear equations: 

(2) If fi 2 — 4 AC < 0, the equation is called elliptic. 

(3) If B l — 4 AC = 0, the equation is called parabolic 

(4) If B 2 — 4A C > 0, the equation is called hyperbolic. 

As an example of a hyperbolic equation, we consider the one-dimensional umilel 

for a vibrating string. The displacement hU, f) is governed by the wave equation 

(5) pu tt (x 7 y ) = Tu xx {x 7 /) for 0 < x < L and 0 < t < oo, 
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x Figure 10.1 The wave equation 
models a vibrating string. 



w(0, 0 = c, Rod 



Figure 10.2 The heat equation 
models the temperature in an 
insulated rod. 


with the given initial position and velocity functions 

u(x, 0) = f(x) for t = 0 and 0 < x < L, 

(o) 

u t (j, 0) = g(x) for t — 0 and 0 < x < L, 
and the boundary values 

^ u(0, 0 =0 for x = 0 and 0 < t < oo, 

u{L, t ) =0 for x — L and 0 < t < oo. 

The constant p is the mass of the string per unit length and T is the tension in the 
string. A diagram of a string with fixed ends at the locations (0, 0) and (L, 0) is shown 
in Figure 10.1. 

As an example of a parabolic equation, we consider the one-dimensional model for 
heat flow in am insulated rod of length L (see Figure 10.2). The heat equation, which 
involves the temperature u(x t r) in the rod at the position x and time t, is 

r S) xu xx ix, t ) = opu t {x , i) for 0 < x < L and 0 < t < cc, 

the initial temperature distribution at t = 0 is 

*1) «(x, 0) = fix) for t = 0 and 0 < x < L, 

md the boundary values at the ends of the rod are 


( 10 ) 


«(0, f) = c\ for x = 0 and 0 < t < oo, 
h(L, t) — C 2 for jc = L and 0 < t < oo. 








516 Chap, 10 Solution of partial differential Equations 



The constant k is the coefficient of thermal conductivity, a is the specific heat, and p 
is the density of the material in the rod. 

As an example of an elliptic equation, consider the potential function u(x, y), 
which might represent a steady-state electrostatic potential or a steady-state temper¬ 
ature distribution in a rectangular region in the plane. These situations are modeled 
with Laplace’s equation in a rectangle: 

(11) u xx (x, y) + «yy(jc, y) = o for 0 < x < 1 and 0 < y < l t 

with boundary conditions specified: 

u(x, 0) = fi(x) for y = 0 and 0 < x < 1 (on the bottom), 

u (jt, 1) - f 2 ( x) for > = i and 0 < x < 1 (on the top), 

u (Q, y) = / 3 (y) for x = 0 and 0 < y < 1 (on the left), 

u( 1, .y) = My) for x = 1 and 0 < y < 1 (on the right). 

A contour plot for u(x, y) with boundary functions ft (x) = 0, f 2 (x) = sin{7r.x). 
My) = 0, and My) = 0 over the square )? = {(*, y) : 0 < x < 1,0 .< y < 1} is 
shown in Figure 10,3. 


10.1 Hyperbolic Equations 
Wave Equation 

As an example of a hyperbolic partial differential equation, we consider the wave equa¬ 
tion 


( 1 ) 


Unix, t) = c 2 u xx (x, t) for 0 < x < a and 0 < t < b. 
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with the boundary conditions 

k( 0,/)=0 and u(n,r)=0 for 0 < t < b, 

(2) m(x, 0) = /(*) for 0 < x < a, 

14 1 (r, 0) = g(jc) tor 0 < x < a. 

The wave equation models Ihe displacement u of a vibrating elastic string with fixed 
ends at x = 0 and x = a. Although analytic solutions to the wave equation can 
be obtained with Fourier series, we use the problem as a prototype of a hyperbolic 
equation. 


Derivation of the Difference Equation 


n — 1 by in — 1 rectangles with sides Ax = h and At = k, as shown in Figure 10.4. Start 
at the bottom row, where t = t\ =0 and the solution is known to be «(x r , q) = /(*/). 
We shall use a difference-equation method to compute approximations 

[ujj : i = 1, 2, in successive rows for j = 2, 3, ,.., m. 

The true solution value at the grid points is «(x,-, ?/). 

The central-difference formulas for approximating u n (x, t) ana u xx (x, r) are 

(3) u lt (x, 0 =- ^2 - +0[k) 


u(x — 2u(x, t ) + w(x — h, t) 

Uxxix, t) — - — 2 - 


0{h z ), 


(4) 
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The grid spacing is uniform in every row: Xj+\ — xt + h (and x^\ = xi - h); and it is 
uniform in every column: tj+\ = tj + k (and i = tj — k). Next, we drop the terms 
0(k 2 ) and 0{h 2 ) and use the approximation Uij for u(Xi, tj) in equations (3) and (4), 
which in turn are substituted into (1); this produces the difference equation 

u iJ +1 — + Ujj-i 2 u i+\,j ~ 2 ufj +Uj-i j 

1 ’ 1 ? “ c P ’ 

which approximates the solution to (1). For convenience, the substitution r = cklh is 
introduced in (5), and we obtain the relation 

(6) u i,j +1 — 2u>,j + — r2 ( u i'+i,y — jj + «i-i,y)* 

Equation (6) is employed to find row j + 1 across the grid, assuming that approxima¬ 
tions in both rows j and ; - 1 are known: 

(7) ujj+i = (2 - 2 r 2 )Uij + r 2 (Ui+ij + Ui-u) - u U -i, 

for / = 2, 3,..., n — 1. The four known values on the right side of equation (7), u inch 
are used to create the approximation usj + 1 , are shown in Figure 10.5. 

Caution must be taken when using formula (7). If the error made at one stage of 
the calculations is eventually dampened out, the method is called stable. To guarantee 
stability in formula (7), it is necessary that r = ck/h < 1. There are other schemes, 
called implicit methods, that are more complicated to implement, but do not have sta¬ 
bility restrictions for r (see Reference [90]). 


Starting Values 

Two starting rows of values corresponding to j = 1 and j = 2 must be supplied in 
order to use formula (7) to compute the third row. Since the second row is not usually 
given, the boundary function #(*) is used to help produce starting approximation^ in 
the second row. Fix x = .r, at the boundary and apply Taylor’s formula of order 1 for 
expanding «(r, t) about (jc,-, 0). The value u(*,-, k) satisfies 


( 8 ) 


u(x it k) = u(xi t 0) + u r (X',0)k -1- 0(k ). 
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Then use M(jq,0) — fix-,) = and u ( (Xf, 0) =■ gU/) = g f in (8) to produce the 
formula for computing the numerical approximations in the second row: 

(9) Ki.2 ~ fi+ kgi for i = 2, 3, ..., n - 1. 

Usually, u(x h t 2 ) ^ 2 , and such errors introduced by formula (9) will propagate 
throughout the grid and will not be dampened out when the scheme in (7) is imple¬ 
mented. Hence it is prudent to use a very small step size for k so that the values for 
«/,2 given in (9) do not contain a large amount of truncation error. 

Often, the boundary function f(x) has a second derivative f "{x) over the interval. 
In this case we have u xx (x, 0) = and it is beneficial to use the Taylor formula 

of order n = 2 to help construct the second row. To do this, we go back to the wave 
equation and use the relationship between the second-order partial derivatives to obtain 


(10) 0) = 0) = c 2 /":*) = c 2 £±- /‘ + fi -' + Oih 2 ). 

h z 

Recall that Taylor’s formula of order 2 is 

(11) u(x, k) = u(x, 0) + U/(x, 0)k + + 0(k 3 ). 

Applying formula (ll)atx - Xj, together with (9) and (10), we get 

(12) u(xt, k) = ft +kgi + —fifi+i - 2fi + /)-i) + 0(h 2 )0(k 2 ) + 0{k y ). 

I sing r = ckfh, formula (12) can be simplified to obtain a difference formula for the 
unproved numerical approximations in the second row: 


Ui.2 - (1 - r Z )fi + kgi + y (//+1 + /,_i) 


or i = 2, 3,..., n ~ L 


D’Alembert’s Solution 

J he French mathematician Jean Le Rond d’Alembert (1717-1783) discovered that 
(J 4) u(x,t) = F(x+ ct) +G(x - ct) 

o a solution to the wave equation (1) over the interval 0 < x < a, provided that 
/ ', F", G ! , and G " all exist and F and G have period 2 a and obey the relationships 
F{~z) = — F(z), F(z + 2a) — F{z)> G(—z) = — G(z), and G(z + 2a) — G(z) for 
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all z. We can check this out by direct substitution. The second-order partial derivatives 
of the solution (14) are 

(15) u lt (x, t) = c 2 F ,, (x + ct) + c 2 G r ‘{x - ct), 

(16) u^OM) = F"(x + ct) + G"{x - ct). 

Substitution of these quantities into (1) produces the desired relationship: 

u tt (x, i) = c 2 P"{x 4 - ct) 4 - c 2 G"(x - cf) 

= c 2 (F f '(x 4- ct) -1- G”{x — ct)) 

= c 2 u xx (x,t). 

The particular solution that has the boundary values «(r t 0) — f(x) and u { (x, 0) = 0 
requires that F(x) - G(jc) = f(x)/2 and is left for the reader to verify. 


Two Exact Rows Given 

The accuracy of the numerical approximations produced by the equations in (7) de¬ 
pends on the truncation errors in the formulas used to convert the partial differential 
equation into a difference equation. Although it is unlikely to know values of the exact 
solution for the second row of the grid, if such knowledge were available, using the 
increment/: = ch along the r-axis will generate an exact solution at all the other points 
throughout the grid. 

Theorem 10.1* Assume that the two rows of values «,- t i = «(*,, 0) and « i(2 = 
wUf, k), for i = 1, 2,.,., n, are the exact solutions to the wave equation (1). If the 
step size k—hfc is chosen aiong the f-axis, then r — J and formula (7) becomes 

(17) UiJ-H = Mi + I.j + Ml —l.y - —1 > 

Furthermore, the finite-difference solutions produced by (17) throughout the grid are 
exact solution values to the differential equation (neglecting computer round-off error). 

Proof. Use d’Alembert’s solution and the relation ck ~h. The calculation x, — ctj = 
(1 — l)/i — c(j — 1 )k := (/ — l)h — (j — l)/t = (i — j)h and a similar one producing 
Xi+ctj = {i+j — 2)h are used in equation (14) to produce the following special form 
of Ujj: 

(18j u irj = F((i - j)h) -F G((i + j - 2)h), 

for f — 3, 2, ..., n and j = 3,2 , m. Applying this formula to the terms 
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Ui+iji and Ujj-i on the right side of (17) yields 

u i+hj + — u i,j-\ 

= F((i + 1 - j)h) + F((i - 1 - j)h) 

- F((i - (j - l))h) 4- G((i 42)A) 

+ C«i - 1 + j - 2)h) - G((i + ; - 1 - 2)h) 

= f((i — (j 4- i))h ) 4- G((i + j 4- 1 — 2)h) — , 

for i = 1,2,..., n and j = 1, 2, ..., m. • 

Warning. Theorem 10.1 does not guarantee that the numerical solutions are exact 
when numerical calculations based on (9) and (13) are used to construct approxima¬ 
tions 2 in the second row. Indeed, truncation error will be introduced if u t \2 ^ 
u(xj, k) for some i, where 1 < i < n. This is why we endeavor to obtain the best 
possible values for the second row by using the second-order Taylor approximations in 
equation (13), 

Example 10*1. Use the finite-difference method to solve the wave equation for a vibrating 
string: 

(19) u tt (x, t) = 4UjjtU* t ) for 0 < x < 1 and 0 < / < 0.5, 

with the boundary conditions 

«(0,0^0 and M(l,f) = 0 for0<r<0.5, 

(20) u(x, 0) = f(x) = sin(7Tx) + sin(27rx) for 0 < x < 1, 

u,(x,Q) -g(x) = 0 for 0 < a < 1. 

For convenience we choose h = 0.1 and k = 0.05, Since c = 2, this yields r = 
ckjh ~ 2(0.05)/0.1=1. Since g(x) — 0 and r = 1, formula (13) for creating the second 
row is 

(21) «i.2 = for i=2, 3.9. 

Substituting r = 1 into equation (7) gives the simplified difference equation 

(22) Ujj .|*1 = Mi + l.j 4-— Ufj-]. 

Applying formulas (21) and (22) successively to generate rows will produce the approxi¬ 
mations to w(jc, f) given in Table 10.1 for 0 < x, < 1 and 0 < tj < 0.50. 

The numerical values in Table 10,1 agree to more than six decimal places of accuracy 
with those obtained with the analytic solution 

«(*, i) = sin(7rx)cos(2jn) + sin(2jrx)cos(4jrf). 
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Table 10.1 Solution of Ihe Wave Equation (19) with Boundary Conditions (20) 


0 


*3 

x 4 

x 5 

x t- 

*7 

*8 

x 9 


0.00 

0,896802 

1.538842 

1,760074 

1.538842 

1.000000 

0.363271 

-0.142040 

-0.363271 

-omibh 

0.05 

0.769421 

1.328438 

1.538842 

1.380037 

0.951056 

0,428980 

0.000000 

-0.210404 

-0.1616)6 

0.10 

0.431636 

0.769421 

0,948401 

0.951056 

0.809017 

0.587785 

0.360616 

0.181636 


0.15 

0.000000 

0.051599 

0.181636 

0.377381 

0.587785 

0.740653 

0.769421 

0.639384 

0.5titTt 

0.20 

-0.380037 

-0.587785 

-0.519421 

-0.181636 

0,309017 

0.769421 

1.019421 

0.951056 

obtwt- 0 

0.25 

-0.587785 

-0.951056 

-0.951056 

-0.587785 

0.000000 

0.587785 

0.951056 

0.951056 

0.5&TTW 

0.30 

-0.571020 

0.951056 

-1.019421 

-0.769421 

-0.309017 

0.181636 

0.519421 

0.587785 

fettooar 

0.35 

-0.363271 

-0.639384 

-0,769421 

-0.740653 

-0.587785 

-0.377381 

-0.18E636 

-0.051599 

0.000000 

0.40 

-0.068364 

-0,181636 

-0.360616 

-0.587785 

-0.809017 

-0.951056 

-0.948401 

-0.769421 


0.45 

0.181636 

0.210404 

0.000000 

-0.428980 

-0.951056 

-1.380037 

-1.538842 

-1.328438 

-0T6945U 

0.50 

0.278768 

0.363271 

0.142040 

-0.363271 

-1.000000 

-1.538842 

-1.760074 

-1,538842 

-aaW)i 



Figure 10.6 The vibrating string for equations (19) arid (20). 


A three-dimensional presentation of the data in Table 10.1 is given in Figure 10.6. * 


Example 10.2. Use the finite-difference method to solve the wave eauation for a vibrating 
string: 

(23) u t! {x, t) — 4m**( jr, t ) for 0 < x < 1 and 0 < t < 0,5, 

with the boundary conditions 



(24) 
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Table 10,2 Solution of the Wave Equation (23) with Boundary Conditions (24) 



*2 

*3 

x 4 

*5 

*6 

X 1 

0.00 

0.100 

0.200 

0,300 

0.400 

0.500 

0.600 

0.05 

0.100 

0.200 

0,300 

0.400 

0.500 

0.475 

0.10 

0.100 

0.200 

0.300 

0.400 

0.375 

0.350 

0.15 

0.100 

0.200 

0.300 

0.275 

0.250 

0.225 

0.20 

0.100 

0.200 

0.175 

0.150 

0.125 

0.100 


a n irtrt Ame n f\en a me a .™ a m* 


*8 

*9 

a lo 

0.450 

0.300 

0.150 

0.450 

0.300 

0.150 

0.325 

0.300 

0.150 

0.200 

0.175 

0.150 

0.075 

0.050 

0.025 


u,ij u, iw u.ui/j u.uju \jsjaj y.uvu — v.ujlj i — u. iuu 

0.30 “0,025 -0.050 -0.075 -0.100 -0.125 -0.150 -0.175 -0.200 -0.100 

0.35 -0.150 -0.175 -0.200 -0.225 -0.250 -0.275 -0.300 -0.200 -0.100 

0.40 -0.150 -0.300 -0.325 -0.350 -0.375 -0.400 -0.300 -0.200 -0.100 

0.45 -0.150 -0.300 -0,450 -0.475 0.500 -0.400 -0.300 -0.200 -0.100 

0.50 -0.150 -0.300 -0.450 -0.600 -0.500 -0.400 -0.300 -0.200 -0.100 



Figure 10.7 The vibrating string for equations (23) and (24). 


ror convenience we cnoose n = u. i ana k 


approximations to u(x, t) given in Table 10.2 for 0 < xy < 1 and 0 < tj < 0.50. A three- 
dimensional presentation of the data in Table 10.2 is given in Figure 10.7. m 


Program 10.1 approximates the solution of the wave equation ((1) and (2», A three- 
dimensional presentation of the output matrix U can be obtained by using the com¬ 
mands raesh(U) or surf (U). Additionally, the command contour (U) will produce a 
graph analogous to Figure 10,3,, while the command contour3(U) will produce the 
three-dimensional analogy of Figure 10.3. 
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Program 10.1 (Finite-diffe rence Solution for the Wave Equation), To approx¬ 
imate the solution of u tt (x, t) = c 2 u xx {x, t) over R =s {(*, r) : 0 < x < a, 0 < 
t < b) with .u(0, t) = 0, u(a, t ) = 0, for 0 < t < b, and u(x , 0) = /(;c), 
u t (x, 0) = g(jc), for0 < x < a. 


function U - finedif (f 

‘/.Input - f*u(x,0) as a string ; f' 

% - g^titCXjO) as a string 

% - a and b right end points of [0,a] and [0,b] 

5i ~ c the constant in the wave equation 

% - n and m number of grid points over [0,a] and [0.b] 

‘/•Output - U solution matrix; analogous to Table 10.1 

^Initialize parameters find U 

h*=a/(n-l) ; 

k»b/(m-1); 

r-“C*k/h; 

r2=r‘2; 

r22=r*2/2; 

sl=l-r"2; 
s2=2-2*r~2; 

U-zeros(n,m); 

^Compute first and second rows 
for i=2:n-l 

UCi,l)=feval(f,h*(i-l)); 

U(i , 2) =sl*feval (f,h* (i-1))+k*feval(g,h*(i-l)) .., 

+r22*(feval(f,h*i)+feval(f,h*(i-2))); 

end 


■/.Compute remaining rows of U 
for j=3:m, 

for i=2:(n-1), 

U(i , j) = s2*U(i,j-l)+r2*(U(i-i,j-i)+U(i+l,j-l))-U(i, 


U=1J'; 
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Exercises for Hyperbolic Equations 


1. (a) Verify by direct substitution that u{x, t ) = sinOmx) cos(2«jrr) is a solution to 

the wave equation , /) = 4 u xx (x , t) for each positive integer n = 1 , 2 , — 
(b) Verify by direct substitution that u(x,t) = sin(nn'r) cos(cnrrf) is a solution 
to the wave equation u tl (x, t) — c 2 u xx (x, /) for each positive integer n — 1, 
2 . 

2. Assume that the initial position and velocity are u(x, 0) = fix) and «,( x, 0) = 0, 
respectively. Show thattlie d’Alembert solution for this case is 


w (x,t) - 


fix 4- ct) + f(x - ct ) 


3. Obtain a simplified form of the difference equation (7) in the case h = 2 ck. 

In Exercises 4 and 5, use the finite-difference method to calculate the first three rows of 
the approximate solution for the given wave equation. Carry out your calculations by hand 
(calculator). 

4, uttix, t) == 4u xx ix, t), for 0 < x < I and 0 < / < 0,5, with the boundary conditions 


uix, 0) = fix) = sin(7rx) for0<x<l, 

u t {x, 0) = g{x) = 0 for 0 < x < 1. 

Let h — 0.2, k = 0.1, and r — 1. 

5. u^ix, t ) == 4u xx ix, t ), for 0 < x < 1 and 0 < / < 0.5, with the boundary conditions 
«(0, r) = 0 and «(l,r) = 0 for 0 < r < 0.5, 


for 0 < x < I, 


15 -15* 

1—4— fo 

u t (x, 0) = g(x) = 0 for 0 < x < 1. 


< x < 1, 


Let h = 0.2, k — 0.1, and r = 1. 

6. Assume that the initial position and velocity area (*,0) f{x) and« f (.r, 0) = g(x), 
respectively. Show that the d’Alembert solution for this case is 

, , /<* + *) +fix-ct) , 1 f x+CI , _ 

Uix, t ) =-™- + — £(J) ds . 

L lc Jx-ct 

7. For the equation u tl (x, t ) = 9u xx (x, t ), what relationship between h and k must occur 

in order to produce the difference equation k/.j+i = Ui+i tJ + Ui-i.y — i ? 

8. What difficulty might occur when trying to use the finite-difference method to solve 
utt ix,t) ~ 4u xx ( x , t ) with the choice k -- 0.02 and h — 0,03? 
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Algorithms and Programs 


In Problems 1 to 8, use Program 10.1 to solve the wave equation u tt (x, t ) = c 7 u xx U 0, 
for 0 < a < a and 0 < i < b, with the boundary conditions 

m( 0, 0 = 0 and u(a,t}—0 for 0 < i < b, 

u(x< 0) — /(*) for 0 < x < a, 

u. (x , 0) = p(r) for 0 £ x < a, 


for the g] 
solutions. 


1. Use fl = l,fe™l } c=l, /(x) = sin(jrx), and g(x) = 0. For convenience, choose 
h — 0,1 and k = 0.1. 


2. Use a = 1, b — 1, c — I, /(x) = x — x 2 , and g(x) = 0. For convenience, choose 
h — 0.1 and k = 0.1. 

2 — lx for \ <x < 1. 

y (jc) = 0, * =0.1, and it = 0.1. 

4. Use a = 1, b = 1, c = 2, f(x) = sin(jrx), g(x) = Ojt = 0.1, and k = 0.05. 

5. Use a = 1, b = 1, c = 2, f(x) — a - x 2 , g(x) = 0, h =0.1, and it = 0,05. 

6. Repeat Problem 3, but with c = 2 and k = 0.05. 

7. Repeat Problem 1, but with /(x) — sin(2rrx) + sin(4rrx). 


8. Repeat Problem 1, but with c = 2, f(x ) = sin(2;rx) + sin(4jrx), and k = 0.05. 


10-2 Parabolic Equations 


Heat Equation 

As an example of parabolic differential equations, we consider the one-dimensional 
heat equation 

(1) u t (x , t) = c 2 Uj(x(x, t ) for 0 < x < a and 0 < t < b. 


wiui me imtiaj conaition 


(2) m(jc, 0) == f (x) for t — 0 and 0 < jc < a. 


and the boundary conditions 


m{ 0, 0 = g'i(f) =c\ for * = 0 and 0 < t < b t 
u(a, t ) = gz(t) = C 2 for x = a and 0 < / < b. 


(3) 



The heat equation models the temperature in an insulated rod with ends held at con¬ 
stant temperatures c\ and a and the initial temperature distribution along the rod be¬ 
ing f(x). Although analytic solutions to the heat equation can he obtained with Fourier 
series, we use the problem as a prototype of a parabolic equation for numerical solu¬ 
tion. 


Derivation! of the Difference Equation 

Assume that the rectangle R = {(x, t) : 0 < * < a, 0 < t < b) is subdivided into 
n — 1 by m 1 rectangles with sides Ax = h and Ar k, as shown in Figure 10.8. 
Start at the bottom row, where f = r, = 0, and the solution is u(x it t\) = /(*,). A 
method for computing the approximations to u(x, t) at grid points in successive rows 
{u(X(, tj) : i = 1, 2, ..., «}, for J = 2, 3,..., m, will be developed. 

The difference formulas used for u t (x, t ) and u xx (x, t ) are 


n t (x, 0 = 


h(x, i +k) - w(x, t) 


+ 0(k) 


and 


rU. 0 = + »<>+*'') + 0(a2) 


The grid spacing is uniform in every row: jq +] = *,■ + h (and x ( -_i = x ( - h), and 
it is uniform in every column: t J+] = tj + ,fc. Next, we drop the terms 0 {k) and 0 (h 2 ) 
and use the approximation u tJ for u(x it tj) in equations (4) and (5), which are in turn 
substituted into equation (1) to obtain 


u U +1 ” u U _2 - 2u iJ + »i+1,/ 

k to¬ 


rn 
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Figure 10.9 The forward differ¬ 
ence stencil. 


which approximates the solution to (1). For convenience, the substitution r = c 2 k/h 2 
is introduced in (6), and the result is the explicit forward-difference equation 

( 7 ) u t j +[ = (1 - 2 r)U'j + «, + ].;)- 

Equation (7) is employed to create the (j + 1 )th row across the grid, assuming that 
approximations in the jth row are known. Notice that this formula explicitly gives the 
vaiue«ij + ] in terms of Ujj, and j. The computational stencil representing 

the situation in formula (7) is given in Figure 10.9. 

The simplicity of formula (7) makes it appealing to use. However, it is impor¬ 
tant to use numerical techniques that are stable. If any error made at one stage of 
the calculations is eventually dampened out, the method is called stable. The explicit 
forward-difference equation (7) is stable if and only if r is restricted to the interval 
0 < r < j. This means that the step size k must satisfy k < h 2 j(2c 2 ). If this condition 
is not fulfilled, errors committed in one line {«, i7 } might be magnified in subsequent 
lines {u(^ p } for some p > j. The next example illustrates this point. 

Example 10.3. Use the forward-difference method to solve the heat equation 

(8) « f (x, 0 = r) for 0 < x < 1 and 0 < t < 0.20, 

with the initial condition 

(9) u(x, 0) = f(x) = 4x - 4x 2 for t = 0 and 0 < x < 1, 

and the boundary conditions 

m( 0, r) = g ] (r) = 0 for x = 0 and 0 < i < 0.20, 
u{l, /} = gi(t) s= 0 for x = 1 and 0 < ; < 0.20. 

3For the first illustration, we use the step sizes Ax = h = 0,2 and At = k = 0.02 and 
c = l, so the ratio is r = 0.5. The grid will be n ~ 6 columns wide by m =■ 11 rows high. 
In this case, formula (7) becomes 


n/,/+] = 


Uf-tJ ’Mf+l.J 
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Thble 10.3 Using the Forward-difference Method with r = 0,5 



xi = 0.00 

x 2 = 0.20 

= 0.40 

. JC 4 — 0.60 | 

*5 =0.80 

* 6 = 1.00 

ti = 0.00 

0.000000 

0.640000 

0.960000 

0.960000 

0.640000 

0.000000 

s? 

II 

o 

S 

0.000000 

0.480000 

0.800000 

0.800000 

0.480000 

0.000000 

t 2 = 0.04 

0.000000 

0.400000 

0.640000 

0.640000 

0.400000 

0.000000 

r 4 = 0.06 

0.000000 

0.320000 

0.520000 

0.520000 

0.320000 

0.000000 

t 5 = o.os 

0.000000 

0.260000 

0.420000 

0.420000 

0.260000 

0.000000 

r 6 = o.io 

0.000000 

0.21 (XXX) 

0.340000 

0.340000 

0.210000 

0.000000 

t 7 — 0.12 

0.000000 

o.ntxxx) 

0.275000 

0.275000 

0.170000 

0.000000 

ig=0.14 

0.000000 

0.137500 

0.222500 

0.222500 

0.137500 

0.000000 

t 9 =0.16 

0.000000 

0.111.250 

0.180000 

0.180000 

0.111250 

0.000000 

qo = 0.18 

0.000000 

0.090000 

0.145625 

0.145625 

0.090000 

0.000000 

ill = 0.20 

0.000000 

0,0721812 

0.117813 

0.117813 

0.072812 

0.000000 



Formula (11) is stable for r = 0.5 and can be used successfully to generate reasonably 
accurate approximations to u(x, t). Successive rows in the grid are given in Table 10.3. 
A three-dimensional presentation of the data in Table 10.3 is given in Figure 10.10. 

For our second illustration, we use the step sizes Ajc = h == 0,2 and At == k = ^ ^ 
0.033333, so that the ratio is r — 0,833333. In this case, formula (7) becomes 


( 12 ) 


Uij +1 = —0.666665ur,j + 0.833333(U(-i >J - -t-iq+ij). 
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Table 10.4 Using the Forward-difference Method with r 0.833333 




x, = 0.00 

x 2 = 0.20 

x 3 = 0.40 

X 4 = 0.60 

x 5 = 0.80 

x 6 = 1.00 

n 

= 0.000000 

0.000000 

0.640000 

0,960000 

0.960000 

0.640000 

0.000000 

*2 

= 0.033333 

0.000000 

0.373333 

0.693333 

0.693333 

0.373333 

0.000000 

n 

= 0.066667 

0.000000 

0.328889 

0.426667 

0.426667 

0.328889 

0.000000 


= 0.100000 

0.000000 

0.136296 

0.345185 

0.345185 

0.136296 

0.000000 

*5 

= 0.133333 

0.000000 

0.196790 

0.171111 

0.171111 

0.196790 

0.000000 

*6 

= 0.166667 

0.000000 

0,011399 

0.192510 

0.192510 

0.011399 

0.000000 

*7 

= 0.200000 

0.000000 

0., 152826 

0.041584 

0.041584 

0.152826 

0.000000 

*8 

= 0.233333 

0.000000 

-0,067230 

0,134286 

0.134286 

-0.067230 

0.000000 

*9 

= 0.266667 

0.000000 

0,156725 

-0.03.3644 

-0.033644 

0.156725 

0.000000 

rio 

= 0.300000 

0.000000 

-0.132520 

0.124997 

0.124997 

-0.132520 

0.000000 

t LI 

= 0.333333 

0.000000 

0,192511 

-0.089601 

—0.089601 

0.192511 

0.000000 



Formula (12) is unstable in this case, because r :> j, and errors committed at one row vsnll 
be magnified in successive rows. Numerical values that turn out to be imprecise approx 
imations to u(x, i), for 0 < r < 0.33333, are given in Table 10.4. A three-dimensional 
presentation of the data in Table 10.4 is given in Figure 10.1L 

The difference equation (7) has accuracy of the order 0(k) + 0(h 2 ). ElecausetM 
term O (k) decreases linearly as k tends to zero, it is not surprising that it must be made 
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sman to produce gooa approximations. However, the stability requirement introduces 
further considerations. Suppose that the solutions over the grid are not sufficiently 
accurate and that both the increments Ax — ho and Af = ko must be reduced. For 
simplicity, suppose that the new x increment is Ax = h\ == /zq/ 2. If the same ratio r 
is used, ki must satisfy 


This results in a doubling and quadrupling of the number of grid points along the x-axis 
and t-axis, respectively. Consequently, there must be an eightfold increase in the total 
computational effort when reducing the grid size in this manner. This extra effort is 
usually prohibiti ve and demands that we explore a more efficient method that does not 
have stability restrictions. The method proposed will be implicit rather than explicit. 
The apparent rise in the level of complexity will have the immediate payoff of being 
unconditionally stable. 


The Crank-NichoLson Method 

An implicit scheme, invented by John Crank and Phyllis Nicholson (see Reference 
[ is based on numerical approximations for solutions of equation (1) at the point 
u . f +kf 2) that lies between the rows in the grid. Specifically, the approximation used 
for M r (x, t + k/2) is obtained from the central-difference formula, 


/ . , k\ u(x,t+k)-u(x,t) , 

H r I x, / + - I =- - -+ Oik 2 ). 


The approximation used for u xx (x, t + k/2) is the average of the approximations 
a t L - (x, f) and ^(x, t + k), which has an accuracy of the order 0{h 2 ): 

u xx ^ = 2^2 (M<x -h,t+k) — 2u(x r t 4- k) + u(x + h,t + k) 

+ u(x — h, t) — 2u(x, r) + u(x + h, t )) + 0(h 2 ). 

In a fashion similar to the previous derivation., we substitute (13) and (14) into (1) and 
neglect the error terms 0(h ? ) and G(k ? ). Then employing the notation U(j = u(x r -, tj ) 
will produce the difference equation 


flS'l Ui ’j + 1 " U ‘J _ + l ~ 2l*iJ+l + Mi+I,y + 1 — 2Ujj + Mj + i.j 

{ } k ~ 2 h 2 ‘ 


Also, the substitution r = c 2 k/h 2 is used in (15). But this time we must solve for the 
three “yet to be computed” values Ui-\j+\, and This is accomplished 
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by placing them all on the left side of the equation. Then reanrangement of the term^ 
in equation (15) results in the implicit difference formula 

(16) — rui-ij+i + (2 + 2r)uij+i — rui + 1, y -h I 

= (2 - 2 r)uij +r{ui-\j + « j + ij )- 

for i = 2 f 3, ,.,, n - 1. The terms on the right-hand side of equation (16) are nil 
known. Hence the equations in (16) form a tridiagonal linear system AX = B. The 
six points used in the Crank-Nicholson formula (16), together with the intermediate 
grid point where the numerical approximations are based, are shown in Figure 10.12. 

Implementation of formula (16) is sometimes done by using the ratio r — 1. In 
this case the increment along the ;r-axis is At = k = h 2 jc 2 , and the equations in (16) 
simplify and become 

(17) + 1 +4HiJ + l -Ui + \J + l —Ui-lJ +K|-+ 1J, 

f or i = 2, 3,..., n — 1. The boundary conditions are used in the first and last equations 
(i.e., uij = hi,/+i = c i and u nJ = u nJ+ 1 = c 2 , respectively). Equations (17) are 
especially pleasing to view in their tridiagonal matrix form AX B. 

4 -1 

-1 4 -1 O 

-1 4 -1 

O -14 

-1 

When the Crank-Nicholson method is implemented with a computer, the linear system 
AX = B can be solved by either direct means or by iteration. 
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The Values «U,, /, ) Using the Crank-Nicholson Method with *,=(_/- 1)/ 100 


1.118034 
0.616905 
0.394184 
0.288660 
0,233112 
0.199450 
0.175881 
0.157405 
0,141858 
0128262 
0,116144 


1.538842 
0,928778 
0.647957 
0.506682 
0.425766 
0,372035 
0.331490 
0 298131 
0.269300 
0.243749 
0.220827 


1,118034 

0.862137 

0.718601 

0.625285 

0.556006 

0.499571 

0.451058 

0.408178 

0.369759 

0.335117 

0.303787 


0.363271 

0.617659 

0.680009 

0.666493 

0.625082 

0.575402 

0.525306 

0.477784 

0.433821 

0.393597 

0.356974 


0.000000 : 

0.490465 

0.648834 

0.673251 

0.645788 

0.600242 

0.550354 

0.501545 

0.455802 

0.413709 

0.375286 


0.363271 
0.617659 
0 680009 
0.666493 
0,625082 
0,575402 
0.525306 
0.477784 
0.433821 
0.393597 
0.356974 


1.118034 

0.862137 

0.718601 

0.625285 

0,556006 

0.499571 

0.451058 

0,408178 

0.369759 

0.335117 

0.303787 


1.538842 
0.928778 
0.647957 
0.506682 
0.425766 
0 372035 
0.331490 
0.298131 
0.269300 
0.243749 
0-220827 


1.118034 
0.616905 
0.394184 
0.288660 
0.233112 
0199450 
0.175881 
0.157405 
0.141858 
0.128262 
0.116144 


Example 10.4. Use the Crank-Nicholson method to solve the equation 

(18) u*{x, r) = u--(x, t ) for 0 < x < 1 and 0 < t <0.1, 
with the initial condition 

(19) n(x, 0) = /(x) = sin(jrx) + sin(37rx) for t = 0 and 0 < x < 1. 
and the boundary conditions 

«(0, t) — gi(t) =0 for x = 0 and 0 < / <0.1, 

u(l, 0 = g 2 (t ) =0 for x == 1 and 0 < / < 0.1, 

For simplicity, we use the step sizes Ax = A = 0,1 and At = k = 0.01 so that the 

ratio is r = 1. The grid will be n ~ 11 columns wide by m = 11 rows high. Applying the 

algorithm generates the values in Table 10.5 for 0 < x; < 1 and 0 <tj <0.1. 

The values obtained with the Crank-Nicholson method compare favorably with the 
analytic solution u(x, t) — sin(n x)e~ 7Tl{ + sin(3jrx)e -9j,,2, ) the true values for the final 
row being 


f lt 0.115285 0.21 9204 | Q.3Q15701 0,35*385 | 0.372569 | 0.3543851 0.301570 | 0.219204 0.115285 
A *hrss-dimen!=innal nresentation of the data in Table 10.5 is elver, in Fisure 10.13. ■ 


Program 10.2 (Forward-difference Method for the Heat Equation). To approx¬ 
imate the solution of u,(x, t) = c 2 u xx {x , t ) over R = {{x, 7) ; 0 < x < a, 0 < t < 
b } with u(x, 0) = /(x), for 0 < x < a, and w(0, /) = C[„ «(a, t) — c 2r for 
0 < i < b. 

function U=forwdif (f l cl l c2,a ] b,c,ii,iD) 







/Input - f ss u(x,0) as £l string ’f* 

/ - cl=u(0,t) and c2=u(a,t) 

/ - a and b right, end points of [0,a] and [0,b] 

/ - c the constant in the heat equation 

% - n and m number of grid points over [0,a] and [0,b] 

“/Output - U solution matrix; analogous to Table 10.4 

"'(Initialize parameters and U 

h=a/(n-i>; 

k=b/(m-l); 

r=c~2*k/h"2; 

s=l-2*r; 

U=zeros(n,m); 

"/Boundary conditions 

UCl,l:m)=cl; 

U(n,l:m)=c2; 

"/Generate first row 

U(2:n-1,l)-feval(fjhrh:(n-2)*h) J ; 

"/Generate remaining rows of U 
for j=2:m 

for i=2:n-l 

UCi,j)^s*UCi > j“l)+r*(U(i-l 1 j-l)+U(i+l(j-l)); 

end 
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end 
U=U>; 

Program 10.3 (Crank-Nkholson Method for the Heat Equation). To approx¬ 
imate the solution of u t (x, t) - c z u xx (x, t) over R = {(*, /) : 0 < x < a, 0 < 
t < b) with u(x, 0) = f( x ) t for 0 < r < a, and «(0, t) = c u u(a , if) = c 2 for 
0 <t <b. 

function TJ=crnich(f,cl ) c2 > a J b,c > n,m) 

'/Input - f»u(x,0) as a string 
'/ - cl=u(0,t) and c2=ni(a,t) 

^ " a and b right end points of [0,a] *Lnd [0,b] 

% “ c the constant in the heat equation 

% ~ n 3Ild m number of grid points over [f^a] and [0,b] 

‘/Output - U solution matrix; analogous to Table 10.5 

/Initialize parameters and U 

h=a/Cn-l); 

k=b/(m-1); 

r*c"2*k/h“2; 

sl=2+2/r; 

s2-2/r-2; 

U=zeros(n J m); 

/Boundary conditions 
0(1 t 1:m)=cl; 

U(n, 1:m)=c2; 

/Generate first row 

U(2:n-l J l)=feval(f.h:h:(n-2)*h) 1 ; 

/Form the diagonal and off-diagonal elements of A and 
/the constant vector B and solve tridiagonal system AX=B 
VdCl,l:n)=sl*ones(1,n); 

Vd(l)=l; 

VdCn)=l; 

Va=-ones(l,n-l); 

Va(n-1)=0; 

Vc--ones Cl,n-l); 

Vc(l)=0; 

Vb(l)=cl; 

VbCn)=c2; 
for j=2:m 

for i=2:n-l 

Vb(i)=U(i-l, j-l)+ij(i+l, j-l)+s2*U(i, j-1) ; 

end 
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X=trisys(Va, Vd, Vc, Vb); 

U(1:n,j)-X J ; 

end 

U=U’ 

Ex ercises for Parabolic Equations 


1. (a) Verify by direct substitution that u (x , i ) = sin(njr x)e~ 4 ^ 7fZ/ is a solution to the 

heat equation u t (x, t) ~ 4u xx (x , t) for each positive integer * = 1,2,_ 

(b) Verify by direct substitution that u(x, t ) = sin(/i7rx)£-<" ,7r j 2 ' is a solution to 
the heat equation u t (x, t) = c 2 u xx (x, r) for each positive integer n — 1 . 2 ..... 

2. What difficulty might occur if At = k = h 2 jc 2 is used with formula (7)? 

In E'.x erases 3 and 4, use the forward-difference method to calculate the first three rows of 
the approximate solution for the given heat equation. Carry out your calculations by hand 
(calculator). 

3. u , (,r, t) = u AX (x,t), for 0 c x < 1 and 0 < t < 0.1, with the initial condition 
u ( x , 0) = f(x) = sin(7rx), for t = 0 and 0 < x < 1, and the boundary conditions 

«(0, t) - ci = 0 for x = 0 and 0 < t < 0.1, 

K(l.f) = C 2 = 0 for x = I and 0 < r < 0.1. 

Let h = 0.2, k — 0.02, and r = 0.5. 

4. u t (x,t) = u xx {x,t), for 0 < x < 1 and 0 < t <0.1, with the initial condition 
u(.r,0) = fix) = 1 - \2x - lj, for t = 0 and 0 < * < 1, and the boundary 
conditions 

k(0, t) = ci = 0 for x = 0 and 0 < t < 0.1, 

u{l,f) = c;: = 0 for x - 1 and 0 < t < 0.1. 

5. Suppose that At = k — h 2 f(2c 2 ). 

(a) Use this in formula (16) and simplify. 

(b) Express the equations in part (a) in the matrix form AX = B. 

(c) Is the matrix in part (b) strictly diagonally dominant? Why? 

6. Show that u{x , t ) = J2j=i sin(yn'jc) is a solution to u t (x , t) = u xx (x t t). 

for 0 < a: < 1 and 0 < r, and has the boundary values a(0, r) = 0, u(\ , /) — 0, and 
«(*, = Jlj=] a i sinO'jrjc). 

7. Consider the analytic solution u(x, t) = sinin)r" 2 ' 4- si:n(3jr;c)<r< 3jr ) 2f that was 
discussed in Example 10.4. 

(a) Hold x fixed and determine lim,-^ u(x, t). 

(b) What does this mean physically? 
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8. Suppose that we wish to solve the parabolic equation w, (x, t) -u xx (x, t ) = k{x), 

(a) Derive the explicit forward-difference equation for this situation. 

(b) Derive the implicit difference formula for this situation. 

9. Suppose that equation (11) is used and that fix) > 0, gi(t) = 0, and g 2 {t) = 0. 

(a) Show that the maximum value of «(*,-, t j+L ) in row j + 1 is less than or equal 
to the maximum of u(x tt tj) in row j. 

(b) Make a conjecture concerning the maximum of «(*,-, /„) in row n as n tends to 
infinity. 

Algorithms and Programs 

In Problems 1 and 2, use Program 10.3 to solve the heat equation u t (x. t) = c 2 u xx (x, t), 
for 0 < jt < 1 and 0 < r < 0.1, with the initial condition u(x. 0) = f(x), for t _ 0 and 
0 < x < 1, and the boundary conditions 

o(0, t) = c] = 0 for x == 0 and 0 < t < 0.1, 

n(l, t) = C 2 — 0 for x = 1 and 0 < / < 0.1, 

for the given values. Use the surf and contour commands to plot your approximate 
solutions. 

1. Use fix) = sin(jrx) + sin (2jrjcj, h =0.1.* = 0.01, and r = 1. 

2. Use fix) = 3 - 13* - l| - |3x - 2|, h = 0.1. k = 0.01 and r = 1. 

3. (a) Modify Programs 10,2 and 10.3 to accept the boundary conditions «(0, t) ... 

gi(t) y!:0andu(a,r) = g 2 (t) #0. 

(b) Use your modified Program 10.3 to solve the heat equations in Problems 1 and 
2 , but use the boundary conditions 

«(0, t ) — £](/) = t 1 for x = 0 and 0 < t < 0.1, 

h(1, t) = g 2 (r) =e t for x = 1 and 0 < r < 0.1, 

in place of c\ = c^= 0, 

(c) Use the surf and contour commands to plot your approximate solutions. 

4. Construct programs to implement your explicit forward-difference equations and im¬ 
plicit difference formula from parts (a) and (b) of Exercise 8, respectively. 

5. Use your programs from Problem 4 to solve the heat equation u f (x, t) — u xx {x, i) = 
sin(x), for 0 < x < 1 and 0 < t < 0.20, with the initial condition u(x, 0) = f(x) = 
sin(7rx) + siri(3jrx) and the boundary conditions 

h(0, t ) = cj = 0 for x = 0 and 0 < t < 0.20, 
u( 1, t) = C 2 = 0 for x = 1 and 0 < t < 0.20. 

Let h = 0.2, k = 0.02, and r = 0.5. 
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10.3 Elliptic Equations 

As examples of elliptic partial differential equations, we consider the Laplace, Poisson, 
and Helmholtz equations. Recall that the Laplacian of the function u(x, y) is 

(1) V 2 « = Uj x + Uyy. 

With this notation, we can write the Laplace, Poisson, and Helmholtz equations in the 
following forms: 


<2) 

= 0 

Laplace’s equation, 

(3) 

V 2 w = g(x, y) 

Poisson’s equation. 

(4) 

V z u + f{x, y)u = g(x, y) 

Helmholtz’s equatio: 


It is often the case that the boundary values for the functions g and / are known at all 
points on the sides of a rectangular region R in the plane. In this case, each of these 
equations can be solved by the numerical technique known as the finite-difference 
method. 


The Laplaciian Difference Equation 

The Laplacian operator must be expressed in a discrete form suitable for numerical 
computations. The formula for approximating fix) is obtained from 


/"(*) = — 


■ ft) — 2/(*) + fix — h) 


When this is applied to the function u(x, y) to approximate u xx (x, y) and u yy (x, y) 
and the results are added, we obtain 


tt2 u(x + h, y) + u(x — h t y) + u(x, y + h) + w(jc, y — h) — 4u(x, y) 

(6) V u - --h U(h J. 


Assume that the rectangle R = {(x, y) : 0 < x < a, 0 < y < b, where bfa = m/n } 
is subdivided into n — \ x m — \ squares with side h (i.e., a = nh and b = mh), as 
shown in Figure 10.14. 

To solve Laplace’s equation, we impose the approximation 


u(x + h, y) + u(x-h, y) + u(x, y + h) + u(x, y - h) -4u(x, y) 

hi 


which has order of accuracy 0(h 2 ) at all interior grid points (x, y) — (jq, yj) for 
i = 2, ..., n — 1 and j = 2, ..., m — 1. The grid points are uniformly spaced: 
Jff+i = x t + h, x ( _i = jc, - h, y/ + i = y, 4- h, and v/_i = y* ™ h. Using the 
approximation uij for u (jc;, yj), equation (7) can be written in the form 
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which is known as the five-point difference formula for Laplace’s equation. This 
formula relates the function value Ujj to its four neighboring values u ( + 1 j, Wj-ij, 
Hjj+i, and uij- 1 , as shown in Figure 10.15. The term Ji 2 can be eliminated in (8) to 
obtain the Laplacian computational formula 

(9) u i+ ij + Ui-ij + Ufj+i + utj-i - 4 Uij = 0. 


Setting Up the Linear System 

Assume that the values u(x, y) are known at the following boundary grid points. 

«(x], yj) = u\ j for 2 < j < m - 1 (on the left), 

h(x, , yi) = Uij for 2 < / < n — 1 (on the bottom), 

u(x n , yj) — u n j for 2 < j < m — 1 (on the right), 

u(xi, y m ) = Ui ;fn for 2 < / < n — 1 (on the top). 

Then applying the Laplacian computational formula (9) at each of the interior points 
of R will create a linear system of (n — 2) equations in (n — 2) unknowns, which is 
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Figure 10.16 A 5 x 5 grid for 
boundary values only. 


solved to obtain approximations to u(x, y) at the interior points of R For example, 
suppose that the region is a square, that n = m = 5, and that the unknown values of 
u{xi, yj) at the nine interior grid points are labeled p\ t pi, pg and positioned in 
die grid as show n in Figure 10.16. 

The Laplacian computational formula (9) is applied at each of the interior grid 
points, and the result is the system A P = B of nine linear equations: 


-4pi+ P2 + P4 — -W2,1-«l,2 

P\-4p2+ Pi + P5 =-** 3,1 

p2 — 4pi + P6 = -"4.1 — “5,2 

PI - 4p 4 + P5 + P7 = -«1,3 

P2 + P4-4p 5 + P6 + PS =0 

P3 + PS- 4 p 6 + P9 = “«5.3 

P4 — 4p7 + p g =— «2,5 ~ “1,4 

P5 + P7- 4ps + pg = -«3,5 


P6 + PS " 4p9 = -«4.5 - »5,4- 


Example 10.5. Find an approximate solution to Laplace's equation V 2 « = 0 in the 
rectangle R = {(x, y) : 0 £ x < 4,0 < y < 4}, where u{x, y) denotes the temperature at 
the point u , v) and the boundary values are 

h(jc, 0) = 20 and u{x,4) = 180 for 0 < x < 4, 
and 


«(0,y) = 80 and u(4,y) = 0 for 0 < y < 4. 


See Figure 10.17 for the grid to be used. 
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Applying formula (9) in this case, the linear system AP = B is 

—4pi + p 2 + P4 =-100 


PI - 

- 4 P2 + 

Pi 


+ 

PS 








= - 

20 


P2 - 

4p3 




+ 

P6 






= - 

20 

pi 



-4p4 

+ 

PS 


+ 

P7 





= - 

80 


P2 


+ P4 

- 

4 ps 

+ 

P6 


+ 

PS 



= 0 




Pi 


+ 

PS 

- 

4 P6 




+ 

P9 

= 0 





P4 





4p 7 

+ 

PS 



= - 

260 






P5 


+ 

PI 

- 

4 ps 

+ 

P9 

= - 

180 








P6 


+ 

PS 


4 P9 

= - 

ISO 


The solution vector P can be obtained by Gaussian elimination (or more efficient 
schemes can be devised, such as the extension of the tridiagonal algorithm to pentadiagonyl 
systems). The temperatures at the interior grid points are expressed in vector form 

P = [Pi P2 Pi Pa ps Ps Pi pg pg\ 

= [55.7143 43.2143 27.1429 79.6429 70.0000 

45.3571 112.857 111.786 84.2857]', ■ 


Derivative Boundary Conditions 

The Neumann boundary conditions specify the directional derivative of u(x t y) normal 
to an edge. For our illustration we will use the zero normal derivative condition. 


(10) 






For applications in the area of heat flow, this means that the edge is thermally insulated 
and the heat flux throughout the edge is zero. 

Suppose that x = x n is held fixed and that we are considering the right edge x =a 
of the rectangle R = ((*, y) : 0 < x <a : 0 < y < b}. The normal hounds condition 
to be used along this edge is 

g 

(11) —-w(x n , Jj) = yy) = 0. 

ox 

Then the Laplace difference equation for the point {x n , yj) is 

(12) u n+l.j + + u nJ+l + “hj'-I ~ 4m„j = 0. 

The value u n +\ j is unknown, because it lies outside the region R, However, we can 
use the numerical differentiation formula 


Ufl+ I.y U n-\J 

2h 


« uA*n, yj) = 0 


and obtain the approximation u n+iJ m„-ij, which has order of accuracy 0(h L ) 
When this approximation is used in (12), the result is 


2w,)-l,y + Un,j+l + u nJ — l ^ u nJ ~~ 0- 
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*2,5= 180 "3.5= 180 « 4>5 = 180 



H 5 2 = 0 


M, , = 0 

Figure 10.19 The 5 x 5 grid in 
Example 10.6. 



This formula relates the function value u nJ to its three neighboring values 
u^j+u and 6 l ' J> 

The computational stencils for the other edges can be derived similarly (see Fig¬ 
ure 10,18). The four cases for the Neumann computational stencils are summarized 
next. 


2«i f 2 + U i-\A + u i+l,l — = 0 

+ \ >m — 4 = 0 

2 « 2 .y + w i,/-i 4 - wi,y+i — 4 uij = 0 
2 “Jl-Li + u n ,-i_! — An., : — f! 


(bottom edge), 
(top edge), 
(left edge). 


Suppose that the derivative condition du(x, y)/dN = 0 is used along part of the 
boundary of R, and that known boundary values of u(x, y ) are used on the other nor- 
bons of the boundary; then we have a mixed problem. The equations for determining 
approximations for u(x ir vj) at boundary points will involve appropriate Neumann 
Computational stencils (14) to (17). The Laplacian computational formula (9) is still 
used to determine approximations for u(x it y /) at the interior points of R. 

Example 10.6. Find an approximate solution to Laplace’s equation V 2 u = 0 in the 
rectangle R = {(*, y) ; 0 < * < 4 ,0 < y < 4}, where «(*. y) denotes the temperature at 
tile point (x, y) and the boundary values are shown in Figure 10 19: 


w(jc, 4) = 180 for 0 < * < 4, 

u y (jc,0)=0 for 0 < x < 4, 

«(0,y) = 80 For 0 < y <4, 

«(4,y) = 0 for 0 < y < 4. 



544 Chap. 10 Solution of Partial Differential Equations 


The Neumann computational formula (14) is applied at the boundary points q\, q 2 , 
and 93 , and the Laplace computational stencil (9) is applied, at the other points 94 , 95 , 
.... q\ 2 - The result is a linear system AQ = B involving 12 equations in 12 unknowns: 


- 4qi + + 2 $ 4 

9t - Ml + 93 + 2^5 

92 Mi 

q\ -4^4+ 95 

q 2 4 - 44 - 4^5 
93 + qs 

94 

95 


= —80 
= 0 


296 







= 0 

+ 

97 






= — 

96 


+ 

98 




= 0 

496 



+ 

99 



= 0 

- 

497 


98 

+ 

910 


= - 

+ 

97 

- 

49s + 

99 

+ 

9il 

= 0 

96 


+ 

98 - 

499 



+ 912 = 0 


97 



- 

4910 + 

911 

= - 




98 

+ 

910 - 

4911 

4- 912-- 


99 + flu - 4oi2 =-180 


The solution vector Q cam be obtained by Gaussian elimination (or more efficient 
schemes can be devised, such as the extension of the tridiagonal algorithm to pentadiag- 
onal systems). The temperatures at the interior grid points and along the lower edge are 
expressed in vector form as 

Q ” [91 92 93 9* 95 96 97 98 99 910 9iJ 912]* 

=. [71.8218 56.8543 32.2342 75.2165 61.6806 36.0412 

87.3636 78.6103 50.2502 115.628 115.147 86.3492]', ■ 


Iterative Methods 

The preceding method showed how to solve Laplace’s difference equation by con¬ 
structing a certain system of linear equations and solving i t. The shortcoming of this 
method is storage; each interior grid point introduces an equation to be solved. Since 
better approximations require a liner mesh grid, many equations might be needed. For 
example, the solution of Laplace’s equation with the Dirichlet boundary conditions re¬ 
quires solving a system of (n - 2)(m — 2 ) equations. If R is divided into a modest 
number of squares, say 10 by 10, there would be 91 equations involving 91 unknowns. 
Lienee it is sensible to develop techniques that will reduce the amount of storage. An 
iterative method would require only the storage of the 100 numerical approximations 
throughout the grid. 

Let us start with Laplace’s difference equation 

— 4m ,j =0 


( 18 ) 


ur+l J + Ui-lJ + 
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and suppose that the boundajy values u(x, y) are known at the following grid points: 

u(*i, yj) = mi,; for 2 < j < m - 1 (on the left), 

> 1 ) = «U for 2 < i < n — 1 (on the bottom), 

u(x„, yj ) = u nJ for 2 < j < m - 1 (on the right), 

Vm) = for 2 < i < n - 1 (on the top). 

Equation (18) is rewritten in the following form that is suiliable for iteration: 

(20) Uij =u;,j +Kij, 


(21) n = + Ui J + 1 + u i.j -1 ~ 4 Ujj 

4 

for 2 < i < n — 1 and 2 < j < m - 1 . 

Starting values for all interior grid points must be supplied. The constant K, which 
is the average of the 2n + 2m — 4 boundary values given in (19), can be used for this 
purpose. One iteration consists of sweeping formula (20) throughout all of the interior 
P°ints of the grid. Successive iterations sweep the interior of the grid with the Laplace 
iterative operator ( 20 ) until the residual term nj on the right side of equation ( 20 ) is 
“reduced to zero” (i.e., \nj\ < <= holds for each 2 < i < n - 1 and 2 < j < m - 1 ). 
The speed of convergence for reducing all the residuals {r-j } to zero is increased by 
using the method called successive overrelaxation (SOR). The SOR method uses the 
iteration formula 


uu = uu + ( 1 ) 


u i+l,j + u i~l,j + Ujj+ 1 -L Ujj -1 —4« f j 


= Uij + Wij, 

where the parameter to lies in the range 1 < a> < 2. In the SOR method, formula (22) 
is swept across the grid until \r- t j | c «?. The optimal choice for w is based on the study 
of eigenvalues of iteration matrices for linear systems and is given in this case by the 
formula 


2 + /* _ ( cos (;£t) + cos (s^r )) 2 


If the Neumann boundary condition is specified on some portion of the boundary, 
we must rewrite equations (14) through (17) in a form that is suitable for iteration. The 
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four cases are summarized next and include the relaxation parameter ox. 

. /2h; i2 + w,-U +H/+I.1 , 

(24) u iA — «i,i + *w I-~-I ( 


(bottom edge). 


, /"2u{ m -i + M l-l,m + w i+l,m 4Wf „ 

«i,m = «(,m + CO I-£- 


(top edge), 


Li — 4i*i ; \ 


UiJ = U u + I — 


l 4 / 


(left edge), 


f2u n ~lj + u n,j -1 "1“ u n.y + l 4i*„ y 

■“l-3- 


(right edge). 


Example 10.7. Use an iterative method to compute an approximate solution to Laplace^ 
equation V 2 = 0 in R = {(x, y) : 0 < x < 4, 0 < y < 4), where the boundary values are: 

i*(x, 0) = 20 and u( x, 4) = 180 for 0 < x < 4, 

and 

w(0, y) — 80 and «(4, y) = 0 for 0 < y < 4. 

For illustration, the square is divided into 64 squares with sides Ax = h = 0.5 and 

Ay = h = 0.5. The initial value at the interior grid points was set at u,j = 70 for 
each i = 2, ..., 8 and j = 2, ..., 8 . The SOR method was used with the parameter 
co = 1.44646 (substitute n — 9 and m = 9 in formula (23)). After 19 iterations, the residual 
was uniformly reduced (i.e.Jryjl < 0.000606 < 0.001). The resulting approximations are 
given in Table 10.6. Because of the discontinuity of the boundary function at the comers 
the boundary values » yi = SO.n^i = 10, w i o = 130, and 1 * 9,9 = 90 have been introduced 
in Table 10.6 and Figure 10.20; they were not used in the computations at the interior grid 
points. A three- dimensional presentation of the data in Table 10.6 is given in Figure 10-20 


Example 10.8. Use an iterative method to compute an approximate solution to Laplace's 
equation V 2 ** = 0 in R = {(x, y) : 0 < x < 4,0 < y < 4J, where the boundary \alue 
are 


1 * (x, 4) = 180 

for 

y — 4 

and 

0 < x 

<4, 

i*y(X,0) = 0 

for 

y = G 

and 

0 

A 

* 

<4, 

w( 0 , y) = 80 

for 

x =0 

and 

0 < y 

<4, 

u(4,y) = 0 

for 

X =4 

and 

VI 

e 

< 4. 


For illustration, the square is divided into 64 squares with sides Ax — h = 0.5 a*id 
Ay — h =0.5 Starting values using linear interpolation were used along the edge when 
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Figure 10.20 u = h(x, y) with Dirichlet boundary values. 


y = yi =0. The initial value at the interior grid points was set at w,j == 70 for each 
i = 2,.... 8 and j = 2 ,..., 8 . Then the SOR method was employed with the parameter 
co — 1.44646 (as in Example 10.7). After 29 iterations, the residual was uniformly reduced; 
(i.e., |r;,y j < 0,000998 < 0.001). The resulting approximations are given in Table 10.7. 
Because of the discontinuity of the boundary functions at the comers, the boundary values 
s*i ,9 = 130 and 1 * 9,9 = 90 have been introduced in Table 10.7 and Figure 10,21; they were 
not used in the computations at the interior grid points. A three-dimensional presentation 
of the data in Table 10.7 is given in Figure 10.21. a 
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Figure 10.21 w = m(jc, y) for a mixed problem. 


Poisson’s and Helmholtz’s Equations 

Consider Poisson’s equation 

(28) V 2 u = g(x,y). 

Using the notation g tJ = g(x ih yj), the generalization of formula (20) for solving (28) 
over the rectangular grid is 

'+],/' + u i-lJ + u iJ+ 1 + u iJ-l — 4 u iJ ~ h 2 §i,j 

(29) Uij = Uij - 2 
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Consider Helmholtz’s equation 


V 2 v + f(x,y)u=g(x,y). 

Using the notation f\j - /(*,-, y j), the generalization of formula (20) for solving (30) 
over the rectangular grid is 

(31) + utj+l +Uj,i-\ -(4-h 2 f u ) UlJ — h 2 gi j 

" '• 4 -Pfu 

These formulas are explored in greater detail in the exercises. 

Improvements 

A modificati on of ( 8 ) that can be employed is the nine-point difference formula l or 
Laplace’s equation: 


1J ^2 w+I 'J-‘ -I-UH-I.J+1 + Ui-l.j + l 

+ 4u i+l j + 4u,-_ij +4 h / j _ h . + 4 h [ i /_ 1 ~2Quij) =0. 

The truncation error for the nine-point difference formula is of the order 0(h 4 ) when 
it is used to solve the Poisson or Helmholtz equation; thus there is no improvement if 
the nine-point difference formula is used instead of the five-point difference formula. 
However, when the nine-point formula is used to solve Laplace’s equation V 2 u = 0, 
the truncation error is of the order 0(h 6 ) and there is an advantage to using the nine- 
point difference formula. 

Program 10.4 (Dirichlet Method for Laplace’s Equation), To approximate the 
solution of u XK (x t y) 4 - Uyy(x, y) = 0 over R = {(jc, y) : 0 < x < a, 0 < y < b) 
with m(*, 0 ) = /i(x), «(*, b) = f 2 {x), forO < r < a, and «( 0 , y) = / 3 (y), 
u(a, y) = f A .{y)> for 0 < y < b. It is assumed that A* == Ay = h and that integers 
n and m exist so that a — nh and b = mh . 

function U=dirich(fl,f2,f3 ) f4 t a,b t h t tol ) maxl> 

7,Input - fi,f2,f3,f4 are boundary functions input as strings 

% ~ a and b right end points of [0,aJ and [0,b] 

“/« ~ h step size 

% - tol is the tolerance 

’/♦Output - U solution matrix; analogous to Table 10.6 
’^Initialize parameters and U 
n=fix(a/h)+l; 
m=fix(b/h)+l; 

ave=(a*(feval(fl t 0)+feval(f2,0)) ... 
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+b*(feval(f3 t 0)+feval(f4,0)))/(2*a+2*b); 

U=ave*ones(n,nO; 

7,Boundary conditions 
U(l,l:n)-feval(f3,0:h:(m-l)*h)’; 

U(n,l :m)=feval(f4 t 0:h: (m-l.)*h) J ; 

U (1: n, 1) =f aval(f 1,0: h:(n-l)*h); 

U(l:n,m)=feval(f2»0:h: (n-l)*h); 

U(l t l)=(U(l,2)+U(2,l))/2; 

UCi,n.)«(U(l ,m-l)+U(2,ra)) /2 j 
U(n,l)=(U(n-l t l)+U(n,2))/2; 

U(n s in) = (U(n-i J m)+UCn,ni-l))/2; 

7.S0R parameter 

w=4/(2+sqrt(4-(cos(pi/(n-l))tcos(pi/(m-1)))~2)); 

‘/.Refine approximations and sweep operator throughout 
“/.the grid 
err=l: 
cnt=0; 

while((err>toI)&(cnt<=maxl)) 

err=G; 
for j=2:m~l 
for i=2:n-l 

relx=w*(U(i,j+i)+U(i.j-l)+U(i+a,j)+U(i-i,j)-4*U(j 

U(i,j)=U(i,j>+relx; 

if (err<=abs(relx)) 

err=abs(relx); 

end 

end 

end 

cnt=cnt+l; 
end 

U=flipudCU’); 


Exercises for Elliptic Equations 


1. (a) Determine the system of four equations in the four unknowns p \, p 2 , p$, and ^ 
for computing approximations for the harmonic function u(x, y) in the rectangle 
R = {{r, y) : 0 < x < 3,0 < y < 3} (see Figure 10.22). The boundary values 
are 


u{x, 0) = 10 and w(x, 3) = 90 for 0 < x < 3, 
«(0, y) = 70 and u( 3, y) = 0 for 0 < y < 3. 


"t, 3 ~ 70 
"1,2=70 


u 


1.3=70 


h i, 2= 7 0 
"l, l =70 


^ 4 = 90 

"3,4 = 90 


• * 

Pi 

P 4 

* 

* i 

P 1 

Pi 

" 2.1 = 10 

"3,1 = 10 

"2,4 = 9° 

"3,4 = 90 

• 

* 



• 

• 



ii 

4 


«4,3=° 


"4,2=° 


u 4,3 = 0 
«4,2 = ° 
"4. 1 = 0 


Figure 10-22 

The grid for Exercise 1 


Figure 10.23 

The grid For Exercise 2 


(b) Solve the equations in part (a) for n<, p 2 , pi, and p$, 

2. (a) Determine the system of six equations in the six unknowns q\ , qj, ..., g 6 for 
computing approximations for the harmonic function u(x, y) in the rectangle 
^ *= {(*> J*) : 0 ^ * <3,0<y<3} (see Figure 10.23). The boundary values 
are 


«{x, 3) = 90 and « v (x, 0) = 90 for 0 < x < 3, 

«(0, y ) = 70 and w(3, y) = 0 for 0 < y < 3. 

(b) Solve the equations in part (a) for q 1 , q 2 ,..., # 5 , 

3, (a) Show that u(x,y) = ai sin(x) sinh(y) + hi sinh(x) sin(y) is a solution of La¬ 

place's equation. 

(b) Show that u(x, y) = a„ sin(«x) sinh(ny) + b n sinh(n.r) sin (ny) is a solution of 
Laplace's equation for each positive integer n = 1,2. 

4. Let u(x, y) = x 2 — y 2 . Determine the quantities u(x + h, y),u(x~h , y), w(x, y+h), 
and u(x, y —h), substitute them into equation (7), and simplify. 
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5. (a) 
(b) 
(O 

«) 


Suppose that u has the form h(x, y) = ax 2 + bxy + cy 2 + dx + ey + /. Find 
a relationship among the coefficients which guarantees that u xx + u yy = 0 . 


Suppose that u has the form given in part (a). 

vi/hij^h ( , rnomnfx»*»r #hrtr ■ * l ** — 


Find a relationship among the 


Find the coefficients of the polynomial «(x,y} given in pan (a) that satisfy 
the partial differential equation in part (a) and also the boundary conditions 
u(x, 0 ) = 0 and u(x t fi) = 0 . 


Find the coefficients of the polynomial u{x,y) given in part (a) that satisf) 
the partial differential equation in part (b) and also the boundary conditions 
w(x, 0 ) = 0 and u(x, f}) =: 0 . 


6 . Solve u xx +Uyy = -4u over J? ;= {(x, y) : 0 < x < 1,0 < y < 1} with the boundary 
values 


u(x, y) = cosf 2 x) + sin( 2 y). 

7. Determine the system of four equations in four unknowns p \, pi, pi, and p 4 fov 
implementing the Laplace nine-point difference equation on the 4x4 grid shown in 
Figure 10.24. 


Algorithms and Programs 

1. (a) UseProgram 10.4 to compute approximations for the harmonic function u (x, y) 
in the rectangle ~ {(x, y) : 0 < x < 1.5, 0 < y < L5J; use h — 0.5. The 
boundary values are 

«(x, 0) = x 4 and u(x, 1.5) = x 4 - 13.5x 2 + 5.0625 for 0 < x < 1.5, 
u(0. >') = y 4 and u( 1.5, y) = y 4 - 13.5y J + 5.0625 for0<y<1.5. 

(b) Use the surf command to plot your approximation from part (a) and compare 
it with the exact solution u(x, y) — x 4 - 6x 2 y 3 + y 4 . 


Sec. 10.3 Elliptic Equations 


553 


2. Modify Program 9 .11 (Tridiagonal Systems) to solve a pentadiagonal system. 

3. (a) Use a 5 x 5 grid similar to that in Example 10.5 and determine the system of 

nine equations in the nine unknowns pu pi, pi, ..., pg for computing approx¬ 
imations for the harmonic function u(x, y) in the rectangle /? = f fx, y) : 0 < 
x < 4,0 < y < 4}. The boundary values are 

u(;t,0)=:10 and n(x,4) = 120 for 0 < x < 4, 

n(0,y) = 90 and n(4, y)=40 for 0 < y < 4. 

(b) Use your modification of Program 9.11 to solve for pi, p 2 , ..., p§. 

(c) Use Program 10.4 to solve for the approximations. 

(d) Use a 9 x 9 grid similar to that in Example 10.7 and Program 10.4 to solve for 
the approximations. 

4. (a) Use a 5 x 5 grid similar to that in Example 10.6 and determine the system of 

12 equations in the 12 unknowns q \, qi ,..., qu for computing approximations 
for the harmonic function w(x, y) in the rectangle If = {(x, y) : 0 < x <4, 
0 < y < 4). The boundary values are 

u(x, 4) — 120 and « v (x, y) = 0 for 0 < x c 4. 

o(0 t y)=90 and u(4, y) = 40 for 0 < y < 4, 

(b) Use your modification of Program 9.11 to solve for q \, qi, ,.., q\ 2 . 

(c) Modify Program 10,4 to solve for the approximations. 

(d) Use a 9 x 9 grid similar to that in Example 10.8 and a modification of Pro¬ 
gram 10.4 to solve for the approximations. 

5. (a) Using a 5 x 5 grid, derive the nine equations involving the nine unknowns pi, 

pi, p 3 . pt) for computing approximations for the solution u(x , y) to Pois¬ 

son’s equation with g(x, y) = 2 in the rectangle R = f(x P v) : 0 < x < 1. 
0 < y < 1}- The boundary values are 

u(x, 0 ) = x 2 and u(x, 1 ) = (x - l ) 2 , for 0 <x<l, 

«{0, y) — y 1 and u(l, y) — (y — I ) 2 for 0 < y < 1, 

(b) Use your modification of Program 9.11 to solve for pi, pi . P ( i> 

(c) Modify Program 10,4 to solve for the approximations. 

(d) Use a 9 x 9 grid and your modification of Program 10.4 to solve for the approx¬ 
imations. 

6. (a) Using a 5 x 5 grid, derive the nine equations involving the nine unknowns p \, 

pi, p 3 » ..., pg for computing approximations for the solution u(x, y) to Pois¬ 
son’s equation with g(x, y) = y in the rectangle R — {(x. y) : 0 < x < 1, 
0 < y < 1}, The boundary values are 

w(x, 0 ) — x 3 and u(x,l)=x 3 for 0 <x<l 
u( 0 , y )=0 and u(l,y) = l for 0 <y<l 
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(b) Use your modification of Program 9.1 1 to solve for p\, pi .po. 

(c) Modify Program 10.4 to solve for the approximations. 

(d) Use a 9 x 9 grid and your modification of Program 10.4 to solve for the approx¬ 
imations. 
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Eigenvalues and Eigenvectors 


The design of certain engineering systems involves the maximum stress theory of 
failure . This theory is based on the assumption that the maximum principal stress 
acting on a body determines its failure. The related mathematical result is the principal 
axes theorem for a linear transformation Y = AX. In two dimensions there exists 
basis vectors U\ and U 2 so that the effect of this transformation is to stretch space in 
the directions parallel to 17 1 and U 2 by the amount X\ and a?, respectively. Consider 
the symmetric matrix 

f3.8 0.61 

|_ 0.6 2 . 2 ]' 



-3 0 3 6 9 12 -3 0 3 6 9 12 

Figure 11.1 (a) Pmimages U\ = [3 l]' and U 2 — [-1 3]' for the transformation Y = AX. (b)The 

intake vectors Vi = AU\ = [12 4]' and V 2 = AV 2 = [-2 6]', 
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the principal directions are U\ = [3 l]' and U 2 = [--1 3]\ with corresponding 

eigenvalues k\ = 4 and X 2 = 2, respectively. Images of these vectors are Vi = 
AU 1 = [12 4]' - 4[3 1 ]' and V 2 = AV 2 = [-2 6]' = 2[-l 3]'. This 

transformation stretches the quarter-circle shown in Figure 11.1(a) into the quarter 
ellipse shown in Figure 11.1 1 (b). 


11.1 Homogeneous Systems: The Eigenvalue Problem 

Background 

We will now review some ideas from lineal' algebra. Proofs of the theorems are cither 
left as exercises or can be found in any standard text on linear algebra (see Refer¬ 
ence [ 132J). 

In Chapter 3 we saw how to solve n linear equations in n unknowns. It was as¬ 
sumed that the determinant of the matrix was nonzero and hence that the solution was 
unique. In the case of a homogeneous system AX = 0, if det(A) ^ 0, the unique 
solution is the trivial solution X — 0. If det(A) = 0, there exist nontrivial solutions to 
AX — 0. Suppose that det(A) — 0, and consider solutions to the homogeneous linear 
system 

tf]i*i +^ 12 * 2 H- \-ai n x n =0 

£121*1 +«22*2H-1“ a 2n x n = 0 

0 ) 

«Hl*t + a /t2*2 + ’ ‘ 1 + G n n x n~ 0. 

The system of equations (1) always has the trivial solution x \ = 0, x 2 = 0,..., x n ~ 0. 
Gaussian elimination can be used to obtain a solution by forming a set of relationships 
between the variables. 

Example 11.1. Find the nontrivial solutions to the homogeneous system 

*] + 2*2 - *3 = 0 

2*1 + *2 + *3 = o 

5*i + 4*2 4- *3 = 0. 

Use Gaussian elimination to eliminate x\ and the result is 

*t + 2*2 - *3 = 0 
-3*2 + 3*3 = o 
-6*2 + 6*3 = 0, 
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Since the third equation is a multiple of the second equation, this system reduces to two 
equations in three unknowns: 

*1 + *2 =0 
-*2 +*3=0- 

We can select one unknown and use it as a parameter For instance, let * 3 = /; then 
the second equation implies that x 2 = t and the first equation is used to compute *i — — / 
Therefore, the solution can lie expressed as the set of relations: 



where i is any real number. ■ 

Definition 11.1 (Linear Independence). The vectors t/i, 1 / 2 . .... Ef* are said to 
be linearly independent if the equation 

(2) C\U\+C 2 Ul~\ - hC n Un=0 

implies that ci - 0, c 2 = 0. c n = 0. If the vectors are not linearly independent 

they are said to be linearly dependent. In other words, the vectors are linearly depen¬ 
dent if there exists a set of numbers {ci, c 2 ,. ■ •, c n } not all zero, such that equation ( 2 ) 
holds. * 

Two vectors in S 2 are linearly independent if and! only if they are not parallel. 
Three vectors in 91 3 are linearly independent if and only if they do not lie in the same 
plane, 

Theorem 11.1. The vectors U\,U 2 . V n are linearly dependent if and only if at 

least one of them is a linear combination of the others, 

A desirable feature for a vector space is the ability to express each vector as a linear 
combination of vectors chosen from a small subset of vectors. This motivates the next 
definition. 

Definition 111,2 (Basis). Suppose that 5 = {V i, U 2 , • - -, Um } is a set of m vectors m 
ifi n . The set S is called a basis for 3T 1 if for every vector X in Sft" there exists a unique 
set of scalars {c ]t c 2 _ c m ) so that X can be expressed as the linear combination 

( 3 ) X — c\U\ + c 2 V 2 H-h c m U m . A 

Theorem 11.2. In , any set of n linearly independent vectors forms a basis of fit". 
Each vector X in SR" is uniquely expressed as a linear combination of the basis vectors, 
as shown in equation (3). 
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Theorem 113. Let K\, K 2i ..., K m be vectors in SH". 

(4) If m > n, then the vectors are linearly independent. 

(5) If m =n, the vectors are linearly dependent if and only if det(AT) = 0, 
where K = [K : K 2 ... K m ], 


Eigenvalues 

Applications of mathematics sometimes encounter the following equations: What are 
the singularities of A - XI , where X is a parameter? What is the behavior of the 
sequence of vectors {A-'XoJj^q? What are the geometric features of a linear trans¬ 
formation? Solutions for problems in many different disciplines, such as economics, 
engineering, and physics, can involve ideas related to these equations. The theory 
of eigenvalues and eigenvectors is powerful enough to help solve these otherwise in¬ 
tractable problems. 

Let A be a square matrix of dimension n x n and let X be a vector of dimension n. 
The product Y = AX can be viewed as a linear transformation from «-dimensional 
space into itself. We want to find scalars X for which there exists a nonzero vector X 
such that 

(6) AX = XX; 


that is, the linear transformation T(X) = AX maps X onto the multiple XX. When 
this occurs, we call X an eigenvector that corresponds to the eigenvalue X, and together 
they form the eigenpair X , X for A. In general, the scalar X and vector X can involve 
complex numbers. For simplicity, most of our illustrations will involve real calcula¬ 
tions. However, the techniques are easily extended to the complex case. The identity 


the standard form for a linear system as 


(7) 


(A - Xf)X = 0. 


The significance of equation (7) is that the product of the matrix (A — XI) and the 
nonzero vector X is the zero vector! According to Theorem 3-5, this linear system has 
nontrivial solutions if and only if the matrix A — A/ is singular, that is, 


( 8 ) 


det(A - XI) = 0. 


This determinant can be written 

in the form 


a\i -X 

<J| 2 

a\ n 

021 

“22 — X 

(*2n 

(9) 

: 


0nl 

a n 2 

a nn — X 
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When the determinant in (9) is expanded, it becomes a polynomial of degree n , which 
is called the characteristic polynomial 

p(X) = det(A - XI) 

= c-iav + fir-' + c 2 r - 2 + ■ ■ • + c ,_,x + c „). 

There exist exactly h roots (not necessarily distinct) of a polynomial of degree n. 
Each root X can be substituted into equation (7) to obtain an uriderdetentuned system 
of equations that has a corresponding nontrivial solution vector X. If A is real, a real 
eigenvector X can be constructed. For emphasis, we state the following definitions. 

Definition 11.3 (Eigenvalue). If A is an n xn real matrix, then its n eigenvalues A i, 
A 2 ,.,., X n are the real and complex roots of the characteristic polynomial 

(11) p(A) = det(A — Af). k 

Definition 11.4 (Eigenvector). If A is an eigenvalue of A and the nonzero vector V 
has the property that 

(12) AV = XV, 

then V is called an eigenvector of A corresponding to the eigenvalue k. k 

The characteristic polynomial (II) can be factored in the form 

(13) p( A) = (-I)"(A - A,) m '(A - X 2 ) m * ■ (A — A*) ffl \ 

where m j is called the multiplicity of the eigenvalue A j. The sum of the multiplicities 


n = m\ +n%2 +-hm*. 

The next three results concern the existence of eigenvectors. 

Theorem 11.4. (a) For each distinct eigenvalue A there exists at least one eigenvec 
tor V corresponding to A. 

(b) If A has multiplicity r, then there exist at most r lineariy independent eigenvec¬ 
tors V\,V 2 , . ..,V r that correspond to A, 

Theorem 11.5. Suppose that A is a square matrix and A], A 2 , .,,, A* are distinct 
eigenvalues of A, with associated eigenvectors V], Vi, ..V*. respectively; then 
{y \, V 2 , ■ -,, Vk) is a set of linearly independent vectors. 


Theorem 11.6. If the eigenvalues of the n x n matrix A are all distinct, then there 
exist n eigenvectors Fj,for j ~ 1,2 ,..., n. 
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Theorem 11.4 is usually applied for hand computations in the following manner. 
The eigenvalue A of multiplicity r > 1 is substituted into the equation 

(14) (A-A/)V=:<>. 

Then Gaussian elimination can be performed to obtain the Gauss reduced form, whkh 
will involve n-k equations in n unknowns, where 1 < k < r. Hence there are k 
free variables to choose. The free variables can be selected in a judicious manner to 
produce k linearly independent solution vectors V\ , .. ., V k that correspond to A. 

Example 11.2. Find the eigenpairs kj, Vj for the matrix 



Also, show that the eigenvectors are linearly independent. 

The characteristic equation det(A - kl) = 0 is 

3 — A. -I 0 

05) -1 2 — X -l = -A 3 + 8 A 2 - 19A+12^0, 

0 -1 3 - k 

which can be written as —(A — 1)(A — 3)(A — 4) = 0, Therefore, the three eigenvalues are 
A | = 1, 12 = 3, and A 3 = 4. 

Case(i): Substitute A 1 = 1 into equation (14) and obtain 


—JC1+X2— X3 ■= 0 
-X2 + 2jc3 = 0. 

Since the sum of the first equation plus two times the second equation plus the third equa¬ 
tion is identically zero, the system can be reduced to two equations in three unknowns: 

2xi —X2 == 0 

—X2 + 2 a - 3 -= 0 . 

Choose X 2 = 2a, where a is an arbitrary constant; then the first and second equations arc 
used to compute xi = a and *3 = a , respectively. Thus the first eigenpair is At — 1 
Vi = [a 2a a]' =a[ 1 2 l]'. 

Case (ii): Substitute At = 3 into equation (14) and obtain 

-x 2 - 0 

— X \ ~ X 2 — X3 =: 0 
-X2 = 0, 


SEC, L1.1 Homogeneous Systems: The Eigenvalue Problem 


561 


This is equivalent to the system of two equations 

xi +*3 - 0 

xj = 0, 

Choose xi = b, where frisan arbitrary constant, and compute x 3 = -b. Hence the second 
eigenpair is A 2 = 3, V 2 = [b 0 —b\ =■ b[ 1 0 -l] ■ 

Case (Hi): Substitute A 3 = 4 into (14): the result is 

-Xi — X2 =0 

—X\ — 2X2 — *3 — 0 
-X2 - X3 = 0. 

This is equivalent to the two equations 

x[+x 2 -0 

X2 + *3 = 0. 

Choose X 3 = c, where c is a constant, then use the second equation to compute *2 = —c. 
Then use the first equation to get x\ = c. Thus the third eigenpair is A 3 = 4, V 3 — 

^ To prove that the vectors are linearly independent, it suffices to apply Theorem 11.5. 
However, it is beneficial to review techniques from linear algebra and use Theorem 11.3. 
Form the determinant 

a b c 

det([Vi V 2 V 3 ])= 2n 0 -c - ~<>abc, 

a —b c 

Since det([Vi V 2 V 3 ]) ^ 0, Theorem 11.3 implies that the vectors V\, V 2t and V 3 are 
linearly independent " 

Example 1L2 shows how hand computations are used to find eigenvalues when 
the dimension n is small; (1) find the coefficients of the characteristic polynomial; 
(2) find its roots; (3) find the nonzero solutions of the homogeneous linear system 
(A — XI ) V = 0 We will take the prevalent approach of studying the power and Jacobi 
methods and the QR algorithm. The QR algorithm and its improvements are used in 
professional software packages such as EISPACK and MATLAB ([178]). 

Since V in (12) is multiplied on the right side of the matrix A, it is called a right 
eigenvector corresponding to A. There also exists a left eigenvector Y such that 

( 16 ) YA = lY '- 

In general, the left eigenvector Y is not equal to the right eigenvector V. However, 
if A is real and symmetric (A' = A), then 

(AY)' = V'A' = V'A, 

( kvy = xv f 


( 17 ) 
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Therefore, the right eigenvector V is a left eigenvector when A is symmetric. In the 
remainder of the book we consider only right eigenvectors. 

An eigenvector V is unique only up to a constant multiple. Suppose that c is a 
scalar; then the following calculation shows that cV is an eigenvector: 

(18) A(cV) = c{AV ) = c(XV) = X(cV), 

To regain some semblance of uniqueness, we normalize the eigenvector in one of 
the following ways. Use one of the vector norms 

(19) 11*11* = max (|jc*J} 

\<k<n 

or 

(20) b*ii 2 = ^T>*l 2 

and require that either IjAl^ ;= 1 or |||| 2 = : L 
Diagonalizability 

The eigenvalue situation is easiest to understand for a diagonal matrix D that has the 
form 

'A| 0 ... 0 

0 X 2 ■■■ 0 

(21) D = diag(Ai ,X 2 ,..., X n ) == . . . 

0 0 ... 

Let Ej = |0 0 ■■ 0 1 0 0] be the standard base vector, where the jth 

component is 1 and all other components are 0. Then 

(22) DEj = [0 0 0 Xj 0 - 0] ' = X } E it 

which implies that the eigenpairs of D are X ,■, Ej for j = 1, 2, .,., n. It is desirable 
to invent a simple way of transforming the matrix A into diagonal form so that the 
eigenvalues are left invariant. This is the motivation for the following definition. 

Definition 11,5. Two n x n matrices A and B are said to be similar if there a 

nonsingular matrix K so that 

(23) B = K X AK. a 

Theorem 11.7. Suppose that A and B are similar matrices and that X is an cirri' 
value of A with corresponding; eigenvector V. Then X is also an eigenvalue of fe. if 
K~ x AK = B , then Y — K~ l V is an eigenvector of B associated with the eigen¬ 
value X. 
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Ann xn matrix A is called diagonalimble if it is similar to a diagonal matrix, 
lhe next theorem illuminates the intimate role of eigenvectors in this process. 

Theorem 11.8 (Diagonalization), The matrix A is similar to a diagonal matrix O if 
and only if it has n linearly independent eigenvectors. If A is similar to Z>, then 

( 24 ) ^ AV = D = diag(A.i, X 2 ,..., A„) 

V = [Vi V 2 ... V r[ j, 

where the n eigenpairs are X J; Vj t for j = I, 2,.... n. 

Theorem 11.8 implies that every' matrix A that has n distinct eigenvalues is diago- 
ralizable. 


Example 11.3. Show that the following matrix is diagonalizable. 

3 -1 O' 

A = -1 2 . 

0-13 

In Example 11.2 we found the eigenvalues X i = 1, X 2 = 3, and X 3 = 4 and the matrix 
of eigenvectors 


*' = [Vi V 2 v 3 ] 


The inverse matrix V 1 is 


It is left to the reader to check the details in computing the product in (24); 


0 -} - 


3-i oi n i n n o o' 

1 2 -1 = 2 0 -1 = 0 3 0 . 

0-1 3j 1-1 l 0 0 4 


Hence we have shown that A can be diagonalized; that is, V~ l AV = D = diag(I, 3,4). ■ 

A more general result relating the structure of a matrix to its eigenvtdues is the 
following theorem. 


Theorem 11.9 (^chur). Suppose that A is an arbitrary n x n matrix. A nonsingular 
matrix P exists with the property that T = P 'AP, where J is an upper-triangular 
matrix whose diagonal entries consist of the eigenvalues of A. 
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Certain types of structural analysis in engineering require that a basis of be 
selected that consists of the eigenvectors of A. This choice makes it easier to visu¬ 
alize how space is transformed by the mapping Y — T(X) — AX. Recall that the 

eigenpair Xj, Vj has the property that T maps Vj onto the multiple of XjVj. This 

characteristic is exploited in the following theorem. 

Theorem 11*10. Suppose that A is an n x n matrix that possesses n linearly inde¬ 
pendent eigenpairs Xj, Vj for, j = 1, 2, ..., n; then any vector X in has a unique 
representation as a linear combination of the eigenvectors: 

(25} X = c\V [ + C 2 V 2 -I-- \~c n V n - 

The linear transformation T(X) — AX maps X onto the vector 

(26) Y — T(X) = cjAi V\ -f c 2 X 2V2 H- 1 - c n X n V n . 

Example 11,4. Suppose that the 3 x 3 matrix A has eigenvalues Aj = 2, X 2 — — 1. 
and A 3 = 4, which correspond to the eigenvectors Vi = [l 2 — 2 ] f , V 2 = [--2 1 i] . 

and V 3 - [l 3 -4]', respectively. If X = [-1 2 l]\ find the image of X under the 

mapping T (A ) = AX. 

We must first express it as a linear combination of the eigenvectors. This is accom¬ 
plished by solving the equation 

[-1 2 l]' = c,[l 2 -2]'+C2[-2 1 l]' + c 3 [l 3 -4]' 

for ci, q, and 03 . Observe that this is equivalent to solving the linear system 

ci - 2C2 + £3 = — 1 
2ci + C 2 + 3c3 = 2 
— 2 ci + Q — 4 c3 — 1 . 

The solution is Cf — 2, C 2 = 1, and C 3 = —1. Using Definition 11.4, for eigenvectors, 
T (A’) is found by the computation 

T(X)=A( 2 V 1 + V 2 -V}) 

= 2 AV l +AV 2 -AV 3 
= 2 ( 2 Vj) — V 2 — 4 V$ 

= [2-57]'. 

Virtues of Symmetry 

The re is no easy way to determine how many linearly independent eigenvectors a ma¬ 
trix possesses without resorting to using the most effective algorithms in a professional 
software package such as EISPACK or MATLAB. However, it is known that a real 
symmetric matrix has n real eigenvectors and that for each eigenvalue of multiplic¬ 
ity mj there corresponds m j linearly independent eigenvectors. Hence every real sym¬ 
metric matrix is diagonalizable. 
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Definition 11,6 (Orthogonal). A set of vectors (Vi, V 2 , ..., V„} is said to be 
orthogonal provided that 

(27) VjVj t=0 whenever j ^ k a 

Definition 11,7 (Orthonormal). Suppose that { V 1 , V 2 ,. - -, V n } is a set of orthog¬ 
onal vectors; then we say that they are orthonormal if they are all of unit norm, that 

is, 

(28) VjV *^0 whenever j £ k. 

V'jVj^l for all j = 1, 2. n. A 

Theorem 11.11. An orthonormal set of vectors is linearly independent. 

Remark. The zero vector cannot belong to an orthonormal set of vectors. 

Definition 11.8 (Orthogonal Matrix). An n x n matrix A is said to be orthogonal 
provided that A ' is the inverse of A; that is, 

(29) A'A = /, 

which is equivalent to 

(30) A~ l = A'. 

Also, A is orthogonal if and only if the columns (and rows) of A form a set of or¬ 
thonormal vectors. a 

Theorem 11.12. If A is a real symmetric matrix, there exists an orthogonal matrix K 
such that 

(31) K'AK ^K~ l AK = D, 

where D is a diagonal matrix consisting of the eigenvalues of A, 

Corollary 11.1, If A is an n x n real symmetric matrix, there exist n linearly inde¬ 
pendent eigenvectors for A, and they form an orthogonal set. 

Corollary 11.2. The eigenvalues of a real symmetric matrix are all real numbers. 

Theorem 11.13. Eigenvectors corresponding to distinct eigenvalues of a symmetric 
matrix are orthogonal. 

Theorem 11.14,. A symmetric matrix A is positive definite if and only if ail the 
eigenvalues of A are positive. 
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An Overview of Methods 

For problems involving moderate-sized symmetric matrices, it is safe to use Jacobi’s 
method. For problems involving large symmetric matrices (for n up to several hun¬ 
dred), it is best to use Householder’s method to produce a tridiagonal foim, followed 
by the QR algorithm. Unlike real symmetric matrices, real unsymmetric matrices can 
have complex eigenvalues and eigenvectors. 

For matrices that possess a dominant eigenvalue, the power method can be used 
to find the dominant eigenvector. Deflation techniques can be used thereafter to find 
the first few subdominant eigenvectors. For real unsymmetric matrices. Householder’s 
method is used to produce a Hessenberg matrix, followed by the LR or QR algorithm. 


Exercises for Homogeneous Systems: The Eigenvalue Problem 


1. For each of the following matrices find (i) the characteristic polynomial p(k), (ii) the 
eigenvalues, and (iii) an eigenvector for each eigenvalue. 


‘1 6 
9 2 


(d) A = 


12 1 

1 0 1 2 
-1 3 2 


<c) A = . 


0 2 2 3 
0 0 3 2 
0 0 0 4 


2, Determine the spectral radius of each of the matrices in Exercise I. 

3. Determine the HAjU and !f A |! 1=c norms of each of the matinees in Exercise 1, 

4. Determine which, if any, of the matrices in Exercise 1 are diagonalizable. For each 
diagonalizable matrix in Exercise 1, find the matrices V and D from Theorem 11.8 
and carry out the matrix product in (24). 

5, (a) For any fixed 9, show that 

_ 1" cos & sin $1 
— sin $ cos 9 


is an orthogonal matrix. 

Remark. The matrix R is called a rotation matrix. 

(b) Determine all values of 9 for which all the eigenvalues of R are real. 

6 . In Section 3.2 the plane rotations R x ((x), Ry(fi), and R z (y) were introduced. 

(a) For any fixed a , and y, show that R x (a\ R y (P), and R z (y) t respectively, 

are orthogonal matrices. 

(b) Determine all values of or, fi, and y for which all the eigenvalues of R x (a), 
Ry(fi), and i?;(y), respectively, are real. 
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7. Let A = 


a + 3 2 


(a) Show that the characteristic polynomial is /j(A) = X 2 — (3 + 2 a)X + a 2 — 3a—4. 

(b) Show/ that the eigenvalues of A are A i = a + 4 and X 2 = a - 1 . 

(c) Show/ that the eigenvectors of A are V\ = [2 l]' tind V 2 = [— 1 2]'. 

8 , Assume that X, V form an eigenpair of the matrix A. If k is a positive integer, prove 
that A.*, V are an eigenpair of the matrix A k . 

9. Suppose that V is an eigenvector of A that corresponds to the eigenvalue X = 3. 
Prove that X = 9 is an eigenvalue of the matrix A 2 corresponding to V. 

10. Suppose that V is an eigenvector of A that corresponds to the eigenvalue X = 2. 
Prove that X — 5 is an eigenvalue of the matrix A -1 corresponding to V. 

11- Suppose that l 7 is an eigenvector of A that corresponds to the eigenvalue X = 5. 
Prove that X — 4 is an eigenvalue of the matrix A — I corresponding to V. 

12. Let A be an n x n square matrix with characteristic polynomial p(X) given by 


p(X) = det(A - XI) 

= (~l) n (X n + Cl X n - 1 +c 2 X n 


■ + C n -\X + c n ). 


(a) Show that the constant term of p(X) is c„ = (-1)" det(A). 

(b) Show that the coefficient of A "" 1 is c[ =—(«[] 4 - aj 2 H--f- 

13, Assume that A is similar to a diagonal matrix; that is 

V~ X AV = D = diag(Aj, X 2 , , , A„). 

If k is a positive integer, prove that 


A k = Vdiag(A?,Ai ... ,xtw~ l . 


11.2 Power Method 

We now describe the power method for computing the dominant eigenpair. Its exten¬ 
sion to the inverse power method is practical for finding any eigenvalue provided that a 
good initial approximation is known. Some schemes for finding eigenvalues use other 
methods that converge fast, but have limited precision. The: inverse power method is 
then invoked to refine the numerical values and gain full precision. To discuss the 
situation, we will need the following definitions. 

Definition 11.10. If A] is an eigenvalue of A that is larger in absolute value than any 
other eigenvalue, it is called the dominant eigenvalue . An eigenvector V 1 correspond¬ 
ing to A] is called a dominant eigenvector. ± 
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Definition 11.11. An eigenvector V is said to be normalized if the coordinate of 
largest magnitude is equal to unity (i.e., the largest coordinate in the vector V is the 
number 1). A 

It is easy to normalize an eigenvector [i>i V 2 ■■■ v„]' by forming a new vector 
V = (l/c)[ui vj ... Un]^ where c = pj and [vyl = maxi<j< n {!Ui'|}. 

Suppose that the matrix A has a dominant eigenvalue A and that there is a unique 
normalized eigenvector V that corresponds to A. This eigenpair A, V can be found by 
the following iterative procedure called the power method. Start with the vector 

( 1 ) Xo = [l 1 ... 1]'. 

Generate the sequence {A^} recursively, using 

Yk = AXk, 

(2) * 1+ , = -—Y t , 

Ck +1 

where c*+i is the coordinate of Yt of largest magnitude (in the case of a tie, choose 
the coordinate that comes first). The sequences {Xt} and Ic*} will converge to V and 
A, respectively: 


lim Xk = V and 

/r~»cc 


lim ck -- A. 
*-►00 


Remark. If Xq is an eigenvector and X 0 =4 V, then some ether starting vector must be 
chosen. 

Example 11.5. Use the power method to find the dominant eigenvalue and eigenvector 
for the matrix 

' 0 11 -5" 

A - -2 17 -7 . 

_-4 26 -10_ 

Start with .Xo = [l 1 i f and use the formulas in (2) to generate the sequence of 
vectors {A*} and constants (c*). The first iteration produces 

"Oil -5”] f ll “61 fi~ 

-2 17 -7 1 = 8 =12 2 =ciXi. 

-4 26 -toj LU _ l2 J L 1 .. 


-4 26 -10 


The second iteration produces 


-2 17 -7 

-4 26 -10 


= C2X2. 








Power Method 


will converge to the dominant eigenvector V j and eigenvalue A], respectively. That is, 
^ lim AT* = Vj andl lim c k = Aj. 

k^oo k ~>oo 

Proof Since A has n eigenvalues, there are n corresponding eigenvectors Vj, for 
j ~ 1,2 that are linearly independent, normalized, and form a basis for n- 
dimensional space. Hence the starting vector ATo can be expressed as the linear combi¬ 
nation 

( 9 > Xq — b\V\ -\-b 2 V 2 -\ -(- b„V n . 

Assume that X 0 = [*] x 2 ... *„]' was chosen in such a manner that b\ £ 0. Also, 
assume that the coordinates of A^o are scaled so that maxi <,<,,{ I*; |} = 1. Because 

l y y}J=j ^ eigenvectors of A, the multiplication AX 0 , followed by normalization 
produces 

Yq = AX 0 = A(b\ V 1 + b2 V 2 -f-1- b„V n ) 

= b[AV i + b 2 AV 2 + ---+b n AV n 
(10> =blkiVi+b2k2V2 + .-.+b n k n V a 

= xf bl v l+bl Pj.)v 2 + ... + b J±) v \, 


x, = -(*,v,+i> 2 (^W 2 + . 


*•©4 


After k iterations we arrive at 


UO 

Y k -i =AX k - } 


ClC 2 ■ 


b ,V l+b2 U\ V2 + „. + b fhL\ " y a 


-- ( bavi + * + - + * 0"^) 

: _jti__( W | Vl+A2 g) , 2V2+ ... +An ^y ‘j.v.j 
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Since we assumed that \kj |/|Xi| < 1 for each j = 2, 3,we have 


lim bt ( ~ } V j = 0 each y — 2, 3, ... , n. 
*-*■«> \X|, / 


Hence it follows that 


hiX* 

Jim Xk = lim - ! — V \ 

k-> oo k-KX C\C 2 ■ ■ ■ Ck 


We have required that both Xk and V i be normalized and their largest component be l 
Hence the limiting vector on the left side of (13) will be normalized, with its largest 
component being 1. Consequently, the limit of the scalar multiple of V\ on the right 
side of (13) exists and its value must be 1; that is, 

biX k . 

(14) lim-1— = l. 

A-+00 C[C2 • * • C k 

Therefore, the sequence of vectors {ATa} converges to the dominant eigenvector: 

(15) lim Xjc = V]. 

k-->OC 

Replacing k with k - 1 in the terms of the sequence in (14) yields 

lim - - -— 1, 

k^-oo CiC2' ■ *£>~i 

and dividing both sides of this result into (14) yields 

..X] Mf/(CiC2”‘Cjfc) 1 

lun — — lim --= 7 = 1 

k—too Ck fr-^oo^iX] f{C\C2 - "Cjfc-l) 1 

Therefore, the sequence of constants {c*} converges to the dominant eigenvalue: 

(16) lim cjt — X], 

*-►00 


and the proof of the theorem is complete. 


Speed of Convergence 

In the light of equation (12) we see that the coefficient of V j in Xk goes to zero in 
proportion to (X;/Xi)* and that the speed of convergence of {AT*} to V i is governed 
by the terms {A. 2 /'A.[)\ Consequently, the rate of convergence is linear. Similarly, the 
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Ifable 11.2 Comparison of the Rate of Convergence of the Power Method and Acceleration of 
the Power Method Using Alton’s A 2 Technique 

Ck^k Ck%k 

C\X\ =12.000000(0.5000CKX) 0.6666667 1]'; 4.3809524(0.4062500 0.6041667 I]'= ci*i 

c 2 X 2 = 5.3333333(0.4375000 0.6250000 1 ]'; 4.0833333(0.4015152 0.6010101 ]]’= ? 2 X 2 

c 3 X 3 — 4.5000000(0.4166667 0.6111111 1]'; 4.0202020(0.4003759 0.6002506 1] / = 

c 4 X 4 ^4.2222222(0.4078947 0.6052632 4.0050125(0.4000938 0.6000625 1 ]'= c 4 X 4 

c 5 X 5 = 4.1052632[0.4038462 0.6025641 1]'; 4.0012508(0.4000234 0.6000156 1 ]'=c 5 X 5 
c 6 X 6 = 4.0512821(0.4018987 0.6012658 ]]'; 4,0003125(0.4000059 0.6000039 i]'= c 6 X 6 

c 7 X 7 = 4.0253165(0.4009434 0.6006289 1J'; 4.0000781(0.4000015 0.6000010 l]'= c 7 X 7 

c s Xs =4.0125786(0.4004702 0.6003135 If; 4.0000195(0.4000004 0.6000002 1]' = TgSg 

c 9 X 9 = 4.0062696(0.4002347 0.6001565 If: 4.0000049(0.4000001 0.6000001 1 Y = c 9 X 9 

C 10'Xid~ 4.0031299(0.4001173 0.6000782 1]'; 4.0000012(0.4000000 0.6000000 l]'=c 10 Jio 


convergence of the sequence of constants {ct J to Xt is linear. The Aitken A 2 method 
can be used for any linearly convergent sequence {/?&} to form a new sequence 

(Pk+\ ~ Pk) 2 
Pk+2 ” 2p k+l + Pk 

that converges faster. In Example 11,4 this Aitken A 2 method can be applied to speed 
up convergence of the sequence of constants (c*}, as well as the first two components of 
the sequence of vectors {Xk). A comparison of the results obtained with this technique 
and the original sequences is shown in Table 11.2. 

Shuted-inverse Power Method 

We will now discuss the shifted inverse power method It requires a good starting 
approximation for an eigenvalue, and then iteration is used to obtain a precise solution. 
Other procedures such as the QM and Given’s method! are used first to obtain the 
starting approximations. Cases involving complex eigenvalues, multiple eigenvalues, 
or the presence of two eigenvalues with the same magnitude or approximately the same 
magnitude, will cause computational difficulties and require more advanced methods. 
Our illustrations will focus on the case where the eigenvalues are distinct. The shifted 
inverse power method is based on the following three results (the proofs are left as 
exercises). 

Theorem 11.19 (Shifting Eigenvalues). Suppose that X, V is an eigenpair of A. If 
a is any constant, then X - or, V is an eigenpair of the matrix A - or/. 

Theorem 11.20 (Inverse Eigenvalues). Suppose that X, V is an eigenpair of A. If 
X ^ 0, then 1/X, V is an eigenpair of the matrix A ~ l . 
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a, V, h a 

Figure 11.2 The location of a for the shifted-inverse 
power method. 


Theorem 11.21. Suppose that A, V is an eigenpair of A. If a ^ A, then 1/(A -a), V 
is an eigenpair of the matrix (A - eel)' l . 

Theorem 11.22 (Shifted-inverse Power Method). Assume that then x n matrix 
A has distinct eigenvalues A], A 2 , .... A„ and consider the eigenvalue Ay. Then a 
constant a can be chosen so that am = l/(Ay - a) is the dominant eigenvalue of 
(A - a/) -1 . Furthermore, if ATq is chosen appropriately, then the sequences {AT* = 


(+>. 

and {m 0 are generated recursively by 


V /A .. I\-l V. 

(17) 

i k — \jn. — m. ) a* 

and 

1 „ 

(18) 

X k+ 1 = -Ft. 

Ck+l 

where 


(19) 

Ck+^x^ and xf y = max {|x- ft) |} 

J J 1 </<« 


will converge to the dominant eigenpair u 1 , Vj of the matrix (A — a I) 1 . Finally, the 
corresponding eigenvalue for the matrix A is given by the calculation 

( 20 ) Xj = ~+a. 

Ml 

Remark. For practical implementations of 'Theorem 11.22, a linear system solver te 
used to compute Ft in each step by solving the linear system (A - aI)Y k = X k . 

Proof. Without loss of generality, we may assume that X\ < A 2 < - • ■ < A rt . Se~ 
lect a number a (a ^ A y) that is closer to Ay than any of the other eigenvalues (See 
Figure 11.2), that is, 

(21) | Ay —cv | < |A| — w| for each i = 1, 2, . . j - 1, j + 1. n. 

According to Theorem 11.21, l/(Ay - a), V is an eigenpair of the matrix 
(A - a/) -1 . Relation (21) implies that 1/jA; - a| < l/|Ay - a| for each i 1 j 
so that am = 1 /(Ay - a) is trie dominant eigenvalue of the matrix (A - a/) M , The 
shifted-inverse power method uses a modification of the power method to determine 
the eigenpair am. Vj. Then the calculation Ay = 1/am + produces the desired 
eigenvalue of title matrix A. 
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Table 113 Shifted-inverse Power Method for the Matrix (A — 4.2/) -1 in 
Example 11. 6 : Convergence to the Eigenvector V = | l]' and am == -5 

(A-ttf)- 1 ^ = _ c k+1 X k+l _ 

(A -tf/r^o =-23.18181818 [0.4117647059 0.6078431373 if = C] X { 

(A - al)~ l Xl =-5.356506239 [0.4009983361 0.6006655574 if = c 2 X 2 

(A~alr 1 X 2 =-5.030252609 [0.4000902120 0.6000601413 1 f = c 3 X 3 


(A -air l X 3 =-5.002733697 [0.4000081966 0.6000054644 If = c 4 X 4 

(A — a /) -1 v 4 — —5.000248382 [0.4000007451 0.6000004957 lj' = c 5 X 5 

(A-<*/)“% = -5.000022579 [0.4000000677 0.6000000452 1]' = c 6 X 6 

(A - a/)" 1 ^ --5.000002053 [0.4000000062 0.6000000041 If = n X 7 

(A - arr l X 7 = -5.000000187 [0.4000000006 0.6000000004 If = c%X$ 

(A — al)~ ] Xu =-5.000000017 [0.4000000001 0.6000000000 If = cgX 9 


Example 11,6. Employ the shifted-inverse power method lo find the eigenpairs of the 
matrix 


A = 


0 11 
-2 17 
-4 26 



Use the fact that the eigenvalues of A are A 1 = 4, X 2 = 2, and A 3 = 1, £tnd select an 
appropriate or and starting vector for each case. 

Case (i): For the eigenvalue Ai = 4, we select a = 4.2 and the starting vector 
Xq — [l 1 l] r . First, form the matrix A - 4.2/, compute the solution to 


-4.2 11 -5 r 

-2 12.8 -7 F 0 = X Q = 1 
-4 26 -14.2J [l 


and get the vector F 0 = [-9.545454545 -14.09090909 -23.18181818]'. Then com 
puteci = —23.18181818 and X\ = [0.4117647059 0.6078431373 l]'. Iteration gener¬ 
ates the values given in Table 11.3. The sequence {c*} converges to am = —5, which is the 
dominant eigenvalue of (A - 4.2/) -1 , and {AT* } converges to V\ = [| | l]\ The eigen¬ 
value A1 of A is given by the computation A] = 1 /jet 1 H- or = l/(-5) +4.2 = —0.2+4,2 = 
4. 

Case (ii): For the eigenvalue X 2 = 2, we select a - 2.1 and the skirting vector 
Xo = [l 1 l] . Form the matrix A — 2.1/, compute the solution to 


-2.1 11 -5 

-2 14.9 -7 Yo = X 0 = 

-4 26 -12.1 


1 

1 

1 


and obtain the vector F 0 = [11.05263158 21.57894737 42.63157895]'. Then c\ = 
42.63157895 and vector AT 1 =[0.2592592593 0.5061728395 l]'. Iteration produces the 




576 Chap. 11 Eigenvalues and Eigenvectors 


Table 11,4 Shifted-inverse Power Method for the Matrix (A — 2.1/) 
Example 11. 6 : Convergence to the Dominant Eigenvector V = \ 


hi = -10 



(A -or/) l Xk = 


c k+\Xj H 


(A-«/r% = 42.63157895 [0.2592592593 0.5061728395 l]' = c x X\ 

(A — X ; = —9.350227420 [0,2494788047 0.4996525365 11' = r 2 X 2 

(A -ai)- i X 2 =-10.03657511 [0.2500273314 0.5000182209 l] f = 

(A -at)~ l X 3 =-9.998082009 [0.2499985612 0.4999990408 i]' = c 4 a 4 

(A - = -10.00010097 [0.2500000757 0.5000000505 1]' = 

= -9.999994686 [0.2499999960 0.4999999973 1]' = c 6 X 6 

(A - af)- l 3f 6 =-10.00000028 [0.2500000002 0.5000000001 l] f ^ c 7 X 1 


Table 11.5 ShiftedAnverse Power Method for the Matrix (A - 0.875/) -1 in 
Example 11.6: Convergence to the Dominant Eigenvector V = [\ \ l] and 
mi = 8 


(A — or/) 1 ATjt = Cjt+jJfjt+i 

(A-a I)~ l X () =-30.40000000 [0.5052631579 0.494736842 J 1J' = 

(A-air'Xi = &.4Q4210526 [Q.5QQ20M008 0.4997995992 I]' = c 2 X 2 

(A-air' L X 2 = 8.015390782 [0.5000080006 0.4999919994 1]' = c 3 Jr 3 

(A-t*/r s *3= 8.000614449 [0.5000003200 0.4999996800 1]' = c 4 Jt 4 

(A-a/r%= 8.000024576 [0,5000000128 0.4999999872 l]' ^ c 5 ^ 5 

(A-«/)-% = S.000000983 [0.5000000005 0.4999999995 Ij y = c 6 X^ 


values given in Table 1 1.4. The dominant eigenvalue of (A —2. II) 1 is hi = —10, and the 
eigenpairof the matrix A is k 2 = l/{—10)+ 2.1 = -0.1 + 2,1 = 2 and V 2 = \ l]. 

Case (Ui); For the eigenvalue >.3 — 1, we select a = 0.875 and the starting vector 
Xq s= [0 1 1]\ Iteration produces the values given in Table. 11.5. The dominant eigen¬ 
value of (A — 0.875 f) -! is hi = 8 , and the eigenpairof matrix A is X 3 = 1/8 + 0.875 = 
0.125 + 0.875 = 1 and V 3 = [4 \ l]\ The sequence {X*} of vectors with the starting 
vector [0 1 1 ] ; converged in seven iterations. (Computational difficulties were encoun¬ 
tered when Jfo — [ 1 1 l]* was used, and convergence took significantly longer.) ■ 


Program 11.1 (Power Method). To compute the dominant eigenvalue k\ and its 
associated eigenvector V j for the n x« matrix A. It is assumed that the n eigenvalues 
have the dominance property (Xj f > IA 2 J > JA 3 J > • > JX n | > 0. 


function [lambda,V]=poverl(A,X,epsilon,maxi) 
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’/.Input - A is an nxn matrix 

/. - X is the nxl starting vector 

% - epsilon is the tolerance 

% - maxi is the maximum number of iterations 

/.Output - lambda is the dominant eigenvalue 

% - V is the dominant eigenvector 

/.Initialize parameters 

lajnbda=0; 

cnt=0; 

err=l; 

state-1; 

while ((cnt<=maxl)&(state==l)) 

Y=A*X; 

/.Normalize Y 
[m j]=max(abs(Y)); 
cl=m; 

dc=abs(lambda-cl); 

Y<l/cl)*Y; 

/.Update X and lambda and check for convergence 
dv=norm(X-Y); 
err=max(dc,dv); 

X-Y; 

lambda-cl; 
state=0; 
if (err>epsilon) 
state=l; 

end 

cnt=cnt+l; 

end 

V=X; 


Program 11.2 (Shifted-inverse Power Method). To compute the dominant eigen¬ 
value kj and its associated eigenvector V j for the n x n matrix A. It is assumed that 
the n eigenvalues have the property Ai < k 2 <•■•<■ k n and that a is a real number 
such that iXj - o| < |A/ - a|, for each i — 1, 2,..., j - 1, j 4- 1, ..,, n. 

function [lambda,V]=invpov(A,X,alpha,epsilon,maxi) 

Xlnput - A is an nxn matrix 
% - X is the nxl starting vector 

% - alpha is the given shift 

% - epsilon is the tolerance 

/, - maxi is the maximum number of iterations 

JiOutput - lambda is the dominant eigenvalue 
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51 - V is the dominant eigenvector 

'/•Initialize the matrix A-alphal and parameters 
[n n]-size(A); 

A=A-alpha*eye(n); 

lambda*0; 

cnt“0; 

err=l: 

sta.te=l; 

while C(cnt<=maxl)&(state==l)) 

'/.Solve system AY=X 
Y=A\X; 

'/Normalize Y 

[m j]=max(abs(Y)); 

cl*®; 

dc=abs(lambda-cl); 

Y=(l/cl)*Y; 

'/.Update X and lambda and check for convergence 
dv=norm(X-Y); 
err=max(dc,dv); 

X=Y; 

lambda*cl; 
state=0; 

if (err>epsilon) 
state=l; 

end 

cnt*cnt+L; 

end 

lambda=alpha+l/cl; 

V=X; 


Exercises for Power Method 

1. Let A, V be an eigenpair of A. If a is any constant, show that A — a, V is an eigenpair 
of the matrix A — al. 

2. Let A, V be an eigenpair of A. ff A ^ 0, show that I/A, V is an eigenpair of the 
matrix A -1 . 

3. Let A, V be an eigenpair of A. If a ^ X, show that 1/(A — a), V is an eigenpair of 
the matrix (A — 

4. Deflation techniques. Suppose that Ai, A 2 , A 3 , ..., X n are the eigenvalues of A with 

associated eigenvectors Vi, Vz, V 3 _, V„ and that Ai has multiplicity 1. If X is 
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any vector with the property that X'V\ = 1, prove that the matrix 

B^A- X x VyX' 

has eigenvalues 0, A 2 , A 3 , .. ., A n with associated eigenvectors V\,Wi> Wd,.. -, W*, 
where Vj and Wj are related by the equation 

Vj = (A - Xi)Wj + A] (X'Wj)Vt for each j = 2, 3, ..., n. 

5. Markov processes and eigenvalues. A Markov process can be described by a square 
matrix A whose entries are all positive and the column sums all equal L For illus¬ 
tration, let Pq = [x f0) y^] record the number of people in a certain city who use 
brands X and Y, respectively. Each month people decide to keep using the same brand 
or switch brands. The probability that a user of brand X will switch to brand Y is 0.3. 
The probability that a user of brand Y will switch to brand X is 0.2. The transition 
matrix for this process is 

p i+1 = ap* = [“;* “][$]■ 

If APj = Pj for some j, then Pj — V is said to be the steady-state distribution 
for the Markov process. Thus, if there is a steady-state distribution, then A = 1 must 
be an eigenvalue of A. Additionally, the steady-state distribution V is an eigenvector 
associated with k — 1 (i.e., solve (A — I)V =. Qy 

(a) For the example given above; verify that A — 1 is an eigenvalue of the transition 
matrix A. 

(b) Verify that the set of eigenvectors associated with A = 1 is {t [3/2 lV : t e 
SR, t £ 0}. 

(c) Assume that the population of the city was 50,000. Use your results from 
part (b) to verify that the steady-state distribution is [30,000 20,000] / . 
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5. Suppose that the probability that a user of brand X will switch to brand Y or Z is 0.4 
and 0.2, respectively. The probability that a user of brand Y will switch to brand X 
or Z is 0.2 and 0.2, respectively. The probability that a user of brand Z will switch to 
brand X or Y is 0.1 and 0.1, respectively. The transition matrix for this process is 



"0.4 

0.2 

O.f 

!V*>" 

11 

J* 

ft. 

0.4 

0.6 

0.1 



0.2 

0.2 

0.8 

z (k) 


(a) Verify that X = 1 is an eigenvalue of A 

(b) Determine the steady-state distribution for a population of 80,000. 

6. Suppose that the coffee industry consists of live brands B\, B 2 , B$, Bn, and B$. As¬ 
sume that each customer purchases a 3-pound can of coffee each month and 60 mil¬ 
lion pounds of coffee is sold each month. Regardless of brand, each pound of coffee 
represents a profit of one dollar. The coffee industry has empirically determined the 
following transition matrix A for monthly coffee sales, where an represents the prob¬ 
ability that a customer will purchase brand Bi given that their previous purchase was 
brand B j. 


0.1 

0.2 

0.2 

0.6 

0.2 

0.1 

0.1 

0.1 

0.1 

0.2 

0.1 

0.3 

0.4 

0.1 

0.2 

0.3 

0.3 

0.1 

0.1 

0.2 

0.4 

0.1 

0.2 

0.1 

0.2 


An advertising agency guarantees the manufacturer of brand B\ that, for $40 million 
a year, they can change the first column of A to [0.3 0.1 0.1 0.2 0.3] . Should the 
manufacturer of brand B] hire the advertising agency? 

7. Write a program, based on the deflation technique in Exercise 4, to find all the eigen¬ 
values of a given matrix. Your program should call Program 11.1 as a subroutine to 
determine the dominant eigenvalue and eigenvector at each iteration. 

8 . Use your program from Problem 7 to find all the eigenvalues of the following matri- 


(a) 


A = 


I 2-1 
1 0 1 
4-4 5 


(b) A = [fly], where ay 


* j i ^ and i , j = 1 . 2 ,..., 1 5 . 

ij 1 * J 


11.3 Jacobi’s Method 

Jacobi’s method is an easily understood algorithm for finding all eigenpairs for a sym¬ 
metric matrix. It is a reliable method that produces uniformly accurate answers for the 
results. For matrices of order up to 10, the algorithm is competitive with more sophis¬ 
ticated ones. If speed is not a major consideration, it is quite acceptable for matrices 
up to order 20. 

A solution is guaranteed for all real symmetric matrices when Jacobi’s method is 
used. This limitation is not severe since many practical problems of applied math¬ 
ematics and engineering involve symmetric matrices, From a theoretical viewpoint, 
the method embodies techniques that are found in more sophisticated algorithms. For 
instructive purposes, it is worthwhile to investigate the details of Jacobi’s method. 

Plane Rotations 

We start with some geometrical background about coordinate transformations. Let X 
denote a vector in n-dimensional space and consider the linear transformation Y = 
RX, where R is an n x n matrix: 


= 1 .. 

0 

0 

■ o' 


0 -- 

COS 0 ■ 

• sin 0 - 

• 0 

*— row p 

0 ■ 

■ — sin0 • 

• COS 0 • 

• 0 

row q 

0 -- 

0 

0 

• 1 



t 

col p 

t 

col 4 




Here all off-diagonal elements of J? are zero except for the values ± sin 0, and all 
diagonal elements are 1 except for cos0. The effect of the transformation Y = if X is 
easy to grasp: 

yj = xj when j ^ p and j q , 

y p = x p cos 0 + x q sin 0, 
y q — —x p sin <p + x q cos 0. 

Title transformation is seen to be a rotation of /i-dimensional space in the x p x q -plane 
through the angle 0. By selecting an appropriate angle 0, we could make either y p = 0 
or y q = 0 in the image. The inverse transformation X = R~ l Y rotates space in the 
same -plane through the angle —0. Observe that R is an orthogonal matrix; that 
is, 


R~'=R' or RR = I. 
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Similarity and Orthogonal Transformations 

Consider the eigenproblem 

(1) AX = kX. 

Suppose that K is a nonsingular matrix and that B is defined : by 

(2) B = K~ ] AK. 

Multiply both members of (2) on the right side by the quantity X . This produces 

BK l X = K~ l AKK~ l X = K~ l AX 

(3) , t 

= K~ x kX = XK~ l X. 

We define the change of variable 

(4) Y = K~ l X or X = KY . 

When (4) is used in (3), the new eigenproblem is 

(5) BY = kY. 

Comparing (1) and (5), we see that the similarity transformation (2) preserved the 
eigenvalue X and that the eigenvectors are different, but are related by the change of 
variable in (4), 

Suppose that the matrix R is an orthogonal matrix (i.e., R~ l = R 1 ) and that D is 
defined by 

(6) D = RAR. 

Mu ltiply both terms in (6) on the right by R'X to obtain 

(7) DRX = R’ARR’X = RAX = RkX = kR f X. 

We define the change of variable 

(8) Y = R'X or X = RY 
Now use (8) in (7) to obtain a new eigenproblem, 

(9) DY = XT. 

As before, the eigenvalues of (1) and (9) are the same. However, for equation (9) the 
change of variable (8) makes i! easier to convert X to F and Y back into X because 
R- 1 = R r . 

In addition, suppose that A is a symmetric matrix (i.e., A = A'}. Then we find that 

(10) D' = (R'AR)' = R'A(R'Y = RAR = Z>. 

Hence D is a symmetric matrix. Therefore, we conclude that if A is a symmetric matrix 
and R is an orthogonal matrix the transformation of A to D given by (6) preserves 
symmetry as well as eigenvalues. The relationship between their eigenvectors is given 
by the change of variable (8). 
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Jacobi Series of Transformations 

Start with the real symmetric matrix A. Then construct the sequence of orthogonal 
matrices J?i, if;?, ..., R n as follows: 

01 ) Do = A 

Dj = R’jDj^Rj for j = ], 2, .... 

We will snow how to construct the sequence [} so that 

(12) lim Dj = D = diag(Xj, X 2 ,..., A„). 

In practice we will stop when the off-diagonal elements are dose to zero. Then we will 
have 

(13) D n ^ D. 

The construction produces 

(14) D„ = R n R n _i ■ - - if] AR 1 if 2 1 • • , 

If we define 

05 ) R = R lR2 .R n ^R fU 

then if -1 Ai? = D, which implies that 

(16) Aif = RD — R diag(Xi, A 2 .A„). 

Let the columns of R be denoted by the vectors X x ,X 2 ,..., X n . Then R can he 
expressed as a row vector of column vectors: 

(17) K = [X, x 2 ... JT„]. 

Tile columns of the products in (16) now take on the form 

(18) [AX X ax 2 ... Ajr„] = [Xjjr 1 k 2 x 2 ... k n x„]. 

From ( 17 ) and (18) we see that the vector X y, which is the jth column of R, is an 
eigenvector that corresponds to the eigenvalue Ay. 
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General Step 

Each step in the Jacobi iteration will accomplish the limited objective of reduction of 
the two off-diagonal elements a pq and a qp to zero. Let R] denote the first orthogonal 
matrix used. Suppose that 

(19) Z>i = 

reduces the elements a pq and a qp to zero, where J?| has the form 

"1 ... 0 ■■■ 0 cf 

0 ■ ■ • C ■ • * S • * * 0 row p 

( 20 ) Ri = ; ; 

0 ... _j ... c ... o row q 


col p col q 

Here al! off-diagonal elements of if i are zero except for the element s located in 
row p, column q and the element -s located in row q , column p. Also note that all 
diagonal elements are 1 except for the element c, which appears at two locations, in 
row p column p, and in row q , column q. The matrix is a plane rotation where we 
have used the notation c = cos 0 and s — sin <p. 

We must verify that the transformation (19) will produce a change only to rows p 
and q and columns p and q. Consider postmultiplication of A by R i and the product 
B = AR\: 


The row by column rule for multiplication applies, and we observe that there is no 
change to columns 1 to p — 1 and p + 1 to q — 1 and q + 1 to n. Hence only columns p 
and q are altered. 


bjk— Qjk 
b Jp = ca.jp - saj 

b ia — SO in T" CO j 


when k ^ p and k ^ q, 
for y = n, 

for / = 1,2. n. 
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A similar argument shows that premultiplication of A by R\ will only alter rows p 
and q . Therefore, the transformation 

(23) D\=R\ARi 

will alter only columns p and q and rows p and q of A. The elements djk of D\ are 
computed with the formulas 

dj p = ca jp - sa jq when j ^ p and j ^ q, 

d jq = sa Jp + cajq when j ^ p and j ^ q, 

(24) d pp = c 2 a pp + s 2 a qq - 2csa pq , 
d qq = s 2 a pp + c 2 a qq + 2 csa pq , 

d pq = (C 2 — S 2 )O pq + cs{p pp — Oqq), 

and the other elements of D\ are found by symmetry. 

Zeroing out d pq and d qp 

The goal for each step of Jacobi’s iteration is to make the two off-diagonal elements 
d pq and d qp zero. The obvious strategy would be to observe the fact that 

(25) c = cos0 and s = sin0, 

where 0 is the angle of rotation that produces the desired effect. However, some mge 
nious maneuvers with trigonometric identities are now required. The identity for cot 0 
is used with (25) to define 

c 2 -s 2 

(26) 0 ~ cot 20 = • 

Suppose that a pq jk 0 and we want to produce d pq = 0. Then using the last 
equation in ( 2A ), we obtain 

(27) 0 = (c 2 - s 2 ')a pq + cs(a pp - a qq ). 

This can be rearranged to yield (c 2 - s 2 )/{cs) = {o qq -a pp )ja pq , which is used in (26 > 
to solve for 9\ 

aqq — Opp 
2a P q 


( 28 ) 
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Although we can use (28) with formulas (25) and (26) to compute c and s, less 
round-off error is propagated if we compute tan 0 and use it in later computations. So 
we define 

(29) t = tan0 = 

c 

Nov/ divide the numerator and denominator in (26) by c 2 to obtain 

_ i - s 2 /c 2 _ i -t 2 
2 s/c ~ It ' 

which yields the equation 

(30) r 2 + 2/0-1 = 0. 


Since t — tan0, the smaller root of (30) corresponds to the smaller angle of rotation 
with |0| < jt/ 4. The special form of the quadratic formula for finding this root is 


(31) 


t = —6 ± (0 2 + 1) 1,/2 = 


sign(fl) 

W + ^+l) 1 / 2 ’ 


where sign(0) = 1 when $ > 0 and sign(0) = -1 when 6 < 0. Then c and s are 
computed with the formulas 


(32) 


1 

C ~ {t 2 + l )"/ 2 


5 = Ct 


Summary of the General Step 

We can now outline the calculations required to zero out the element d pq . First, select 
row p and column q for which a p0 ^ 0. Second, form the preliminary quantities 

Q __ a <M ~ a pp 
2a pq 

sign(0) _ 

(33) ' ]0| + (02+1)1/2' 

1 

c_: {t 2 + l)*/2 f 


s =5 ct . 
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Third, to construct D = D i f use 



(34) 


for j = I : N 

if (j ~= p) and (j -= q ) 

djp = Ci 3 jp 5(3 j q J 

d p} = dj P i 
dj q — CClj q + SQjp', 
dqj — dj q ; 
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Updating the Matrix of Eigenvectors 

We need to keep track of the matrix product R 1 R 2 ■ ■ ■ I?*. When we stop at the nth 
iteration, we will have computed 

(35) V„ = RiR 2 - R n , 

where V n is an orthogonal matrix. We need only keep track of the current matrix V r ,-. 
for j = 1,2,..., n. Start by initializing V = /. Use the vector variables XP and XQ 
to store columns p and q of A, respectively. Then for each step perform the calculation 


(36) 


for j — \ : N' 

XP/ = v jp \ 

XQ/ = v Jq ; 

end 

for j - 1 : N 

Vj p — cXPj - sXQj ; 
Vj q = jXP j + cXQ j ; 


end 
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Strategy for Eliminating a pq 

The speed of convergence of Jacobi’s method is seen by considering the sum of the 
squares of the off-diagonal elements: 

si = E i “ )k \ 2 

j.*=i 
M r 

n 

K/*l 2 - w h ere Di = R'AR. 

j,k=] 

MJ 

The reader can verify that the equations given in (34) can be used to prove that 

(39) S 2 = Si-2\a P9 \ 2 . 

At each step vve let Sj denote the sum of the squares of the off-diagonal elements 
of Dj. Then the sequence {5/} decreases monotonically and is bounded below by zero. 
Jacobi’s original algorithm of 1846 selected, at each step, the off-diagonal element a p(} 
of largest magnitude to zero out and involved a search to compute the value 

(40) max{A] = maxftfnJ}. 

p<r.q 


(37) 

(38) 


This choice will guarantee that {Sj} converges to zero. As a consequence, this proves 
that [Dj] converges to D and {V;} converges to the matrix V of eigenvectors (see 
Reference 168]), 

Jacobi’s search can become time consuming since it requires an order of ( n 2 - «)/2 


' fftr Inrcpr Vfl!!!: 


companauna ill a KJUp. ii. is piUiiiUiliVC ivi iuig^i values vi ft. M ucuci ID Wit. 

cyclic Jacobi method, where one annihilates elements in a strict order across the rows. 
A tolerance value € is selected; then a sweep is made throughout the matrix and, if an 


element a pq is found to be larger than e, it is zeroed out. For one sweep through the 
matrix the elements are checked in row 1, a\ 2 , «i3, • • -, then row 2, 023 , ^ 24 ,- 


a 2n ; and so on. It has been proved that the convergence rate is; quadratic for both the 
original and cyclic Jacobi methods. An implementation of the cyclic Jacobi method 
starts by observing that the sum of the squares of the diagonal elements increases with 


each iteration; that is, if 


7b = X>j 

i -1 


7-1 


and 
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7*1 — To + 2\ a pq 1^. 

Consequently, the sequence {Dj} converges to the diagonal matrix D. Notice that the 
average size of a diagonal element can be computed with the formula (7b/ n) - The 
magnitudes of the off-diagonal elements are compared to e(7b/n) ,/2 , where € is the 
preassigned tolerance. Therefore, the element a pq is zeroed out if 

(To\ l * 

(42) \<tpq\ > e ^7/ ' 

Another variation of the method, called the threshold Jacobi method, is left for the 
reader to investigate (see Reference [178]). 

Example 11.7- Use Jacobi iteration to transform the following symmetric matrix into 
diagonal form. 

"8 -1 3 -f 

-1 6 2 0 

3 2 9 1 

-1 0 1 7_ 

The computational details are left for the reader. The first rotation matrix that will zero 
out ai 3 = 3 is 

" 0.763020 O.OOOOOO 0.646375 0.000000“ 

0.000000 0.000000 0.000000 0.000000 

J?1 - _0.646.375 0.000000 0.763020 0.000000 

0.000000 0.000000 0.000000 0.000000 


Calculation reveals that A 2 = J?iAi R\ is 


5.458619 

—2.055770 

0.000000 

-1.409395 


-2.055770 

6.000000 

0.879665 

0.000000 


0.000000 

0.879665 

11.541381 

0.116645 


-1.409395 

0.000000 

0.116645 

7.000000 


Next, the element a\ 2 = —2.055770 is zeroed out and we get 


3.655795 

0.000000 

0.579997 

-1.059649" 

0.000000 

7.802824 

0,661373 

0.929268 

0.579997 

0.661373 

11.541381 

0.116645 

-1.059649 

0.929268 

0.116645 

7.000000 

arrive at 

'3.295870 

0.002521 

0.037859 

0.000000' 

0.002521 

8.405210 

-0.004957 

0.066758 

0.037859 

-0.004957 

11.704123 

-0.001430 

0.000000 

0,066758 

-0.001430 

6.594797 
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It will take six more iterations for the diagonal elements to get close to the diagonal matrix 

D = diag(3.295699,8.407662,11.704301,6.592338). 

However, the off-diagonal elements are not small enough, and it will take throe more it¬ 
erations for them to be less than 10“ s in magnitude. Then the eigenvectors are the columns 
of the matrix V = R 2 R i 8 , which is 


0.528779 

0.591967 

-0.536039 

0.287454 


U.O i JO*rZ U.JoZZiro 

0.472301 0.175776 
0.282050 0.792487 
0.607455 0.044680 


0,230097" 

-0,628975 

—0.071235 

0.739169 


Program 11,3 (Jacobi Iteration for Eigenvalues and Eigenvectors). To compute 
the full set of eigenpairs {Ay, Vj}* =! of the n >; « real symmetric matrix A. Jacobi 
iteration is used to find all eigenpairs. 

function [V > D]=jacobii(A,epsilon) 

Xlnput - A is an nxn matrix 

X - epsilon is the tolerance 

XOutput - V is the nxn matrix of eigenvectors 

X - D is the diagonal nxn matrix of eigenvalues 

^Initialize V,D,and parameters 

D=A; 

[n,n]=size(A); 

V=eye(n); 
state®!; 


'/.Calculate row p and column q of the off-diagonal element 

'/.of greatest magnitude in A 

[ml p]=max(abs(D-diag(diag(D)))); 

[m2 q]=max(ml); 

p=p(q); 

while(state==l) 

‘/Zero out Dpq and Dqp 
t~D(p,q)/(D(q,q)-D(p,p ))\ 

C-l/ sqrt(f'2+1) ; 
s==c*t j 

R==[c s; -s c] ; 

D([p q],:)=R>*D([p q] , : ) ; 

D(:,[p q])=D(:,[p q])*R; 

V(:,[p q] ) =V(:, [p q])*R; 
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[ml p]=max(abs(D"diag(diag(D)))); 

[m2 q]=max(ml); 
p=p(q); 

if (abs(D(p J q))<epsilon*sqrt(sum(diag(D).“2)/n)) 
state=0; 

end 

end 

D=aiag(diag(D)); 


Exercises for Jacobi’s M ethod__ 

1. Mass-spring systems. Consider the undamped mass-spring system shown in Fig¬ 
ure 11,3. The mathematical model describing the displacements from static equilib¬ 
rium is 

pti+fc -ki o i r*i(f>i r m i ° °ir x > (o i f°" 

-ki k 2 + k 3 -h * 2 (t) + o m 2 0 x£(t) = 0 

[_ 0 -kj k3 _ _X3(0_ _ 0 0 m 3 j L*j<0j L°_ 

(a) Use the substitutions xy(0 = vj sin(<wt + 6) for j = 1, 2, 3, where 0 is a con¬ 
stant, and show that the solution to the mathematiail model can be reformulated 
as follows: 


r*±h =*z olM U1 



(b) Set A = co 2 \ then title three solutions to part (a) are the eigenpairs Ay, Vj - 
Lo> ,,ri> ^'>T for j — 1, 2, 3. Show that they are used to form the three 
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fundamental solutions: 


sin (ay/4 0) 

Xj(t) — vf 1 sin(ay/ 4 9) = sin(ay/ + 9) 

sin(<yyf 4 0) 


wiicic (Vj = ,/Ay, Tory = I, Z, j. 

Remark. These three solutions are referred to as the three principal modes of 
vibration. 


2. The homogeneous linear system of differential equations 

* 1 ( 0 — *i (0 + * 2(0 

*i(0 = -2*1 (0 + 4*2(0 

can be written in the matrix form: 

^_r*;(oi_r 1 iin*i( f )i_ 


X\t) = 


= AX(t). 


(a) Verify that 2, [l l]' and 3, [l 2]' are eigenpairs of the matrix A. 

(b) By direct substitution into the matrix form of the system, verify that both X(!) = 
e l! [t l]‘ and X(t) — <? 3 '[l 2]' are solutions of the system of differential equa* 
tions. 

(c) By direct substitution into the matrix form of the system, verify that X(t ) = 
cje 2r [l \]' 4 c 2 e 3t [l 2] r is the general solution of the system of differential 
equations. 

Remark. If the matrix A has n distinct eigenvalues, then it wilt have n linearly 
independent eigenvectors. In this case the general solution of a homogeneous 
system of differential equations can be written as a linear combination: that is, 
X{t) =c ] e kl, V ] + c 2 e^ { V 2 + ■■■ + c„e^ t V n . 

3. Use the technique (by hand) outlined in Exercise 2 to solve each of the following 
initial value problems. 


(a) 

*j = 4*i 4 2*2 . , 

’ with 

*2 = 3*1 - *2 

j *1(0) = 1 

1*2(0) = 2 

(b) 

*5 = 2*1 - 12*2 . . 

wnth 

* 2 = *1 - 5*2 

*i(0) : = 2 
*2(0) = 2 


*J =*2 

*1(0) =1 

(c> 

*2 = *3 

with * 2 ( 0 ) = 2 


*3 = 8*1 — 14*2 4 7*3 

*3(0) = 3 
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Algorithms and Programs 


1. Use Program 11.3 to find the eigenpairs of the given matrix with a tolerance of e = 
iCT 7 . Compare your results with those obtained from the MATLAB command eig 
by entering [eig(A) diag(D) ] in the MATLAB command window. 

f 4 3 2 !l 


(a) A = 


3 2 1 

4 3 2 
3 4 3 
2 3 4 



2.25 - 

-0.25 

-1.25 

2.75 

(b) 

, -0.25 

2.25 

2.75 

1.25 

A ~ -1.25 

2.75 

2.25 

—0.25 


2.75 

1.25 

-0.25 

2.25 

(c) 

A = [fly], where 

fly = 

t' 

i = j 

; ■+ i 


and i, y = 1,2,..., 30. 


(d) A = [fly], where a;; = 


cos(sin(' 4 j)) 
i+ij + j 


andt, j ~ 1,2 .40. 


2. Use the technique outlined in Exercise 1 and Program 1 1 .3 to find the eigenpairs and 
the three principal modes of vibration for the undamped mass-spring systems with the 
following (Coefficients. 

(a) k\ = 3, k 2 = 2, k 2 ~ 1 ,m\ = 1, m 2 = 1, m 3 = i 

(b) k\ = 5 , k 2 = j, *3 == =4, m 2 =4,m 3 =4 

(c) Jtj - 0.2, *2 = 0.4, k 3 = 0.3, m, = 2.5, m 2 = 2.5, m 3 = 2.5 

3. Use the technique outlined in Exercise 2 and Program 11.3 to find the general solution 
of the given homogeneous system of differential equations. 

(a) x\ = 4*i 4- 3*2 4 2*3 4 *4 
*2 = 3*i 4 4*2 + 3*3 4 2*4 
*3 = 2*i 4 3*2 4 4*3 4 3*4 
*4 = *1 4 2*2 4 3*3 4 4*4 

(b) *{ = 5*i 4 4*2 4 3*3 4 2*4 4 *5 
*2 = 4*i 4 5*2 4 4*3 4 3*4 4 2*5 
*3 = 3*i 4 4*2 4 5*3 4 4*4 4 3*5 
*4 = 2*i 4 3*2 4 4*3 4 5*4 4 4*5 
*5 = *1 4 2*2 4 3*3 4 4*4 4 5*5 

4. Modify Program 11.3 to implement the “cyclic” Jacobi method. 

5. Use your program from Problem 4 on the symmetric matrices in Problem 1. In par¬ 
ticular, compare the number of iterations required by your cyclic program and Pro¬ 
gram 11.3 to satisfy the given tolerance. 





i ne crucial step is to use (7) and express c in the form 

(8) c = -2(W l X). 

Now (8) can be used in (6) to see that 

Y = X+cW = X - 2W , XW. 

Since the quantity W'X is a scalar, the last equation can be written as 

(9) Y = X - 2WW'X = (/ - 2WW')X. 

Looktng at (9), we see that P = / - 2WW\ The matrix P is symmetric because 

p' = (i - 2 wwy = i - 2 (wwy 

= / -2WW' = P 

The following calculation shows that P is orthogonal: 

P'P = (/ - 2 WW){I - 2 WW f ) 

= / -4WW' + 4WW'WW' 

= / -4WW 1 + 4WW = J, 

and the proof is complete. » 

It should be observed that the effect of the mapping Y = PX is to reflect X 
through the line whose direction is Z, hence the name Ho useholder reflection. 

Corollary 11,3 (Jfcth Householder Matrix). Let .4 be an n x n matrix, and X any 
vector. If k is an integer with 1 < k < n — 2, we can construct a vecTor W k and matnx 
P k = I -2W k W[ so that 

xi i r jci i 

Xft 

CIO) P k x = P k Xk+x 

•rit+2 


° J 


Xk 

- -S = >\ 
0 


x, 
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Proof. The key is to define the value S so that H-YH 2 = l|T IJ 2 and then invoke Theo¬ 
rem 11.23. The proper value for S must satisfy 

(11) S 2 = +-*jt+2 H- 

which is readily verified by computing the norms of X and Y: 

\\X\\ 2 ^x 2 +x 2 + -'.+x 2 n 


The vector W is found by using equation (3) of Theorem 11.23; 


W=-{X-Y) 

(13) * 

= -[0 ... 0 (**+1+S) Jfjt+2 ... *„]'■ 

Less round-off error is propagated when the sign of 5 is chosen to be the same as the 
sign of x k +i ; hence we compute 

(14) 5 = sign(jr* + i)(x* +1 + x* +2 H--§-*J) 1/2 - 

The number R in (13) is chosen so that || W \\ 2 — 1 and must satisfy 

R 2 = (jfit+i -h 5) 2 + x k+2 + ‘' ■ + x l 


= 2j:* +1 5 + S 2 + j rj + , + x % +2 + • • * + x 2 
= 2x k+l S + 2S 2 . 


Therefore, the matrix Pi is eiven bv the formula 


Pt =J -2 WW\ 


and the proof is complete. 


Householder Transformation 

Suppose that A is a symmetric n x n matrix. Then a sequence of n — 2 transformations 
of the form PAP will reduce A to a symmetric tridiagonal matrix. Let us visualize 
the process when n = 5. The first transformation is defined to be P\AP\, where P 1 
is constructed by applying Corollary 11.3, with the vector X being the first column of 
the matrix A. The general form of P] is 

"1 0 0 0 0 " 

0 P P P P 

p i = 0 p p p p , 

0 P P p P 

J) p p p p_ 


(17) 
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where the letter p stands for some element in P 1 . As a result, the transformation 
P 1 A P 1 does not affect the element a n of A: 

nit in 0 0 0 

ui wi w w w 

(18) P\AP 1 -= 0 if w w w =-Ai. 

0 w w w w 

0 w www 

The element denoted ui is changed because of premultiplication by P], and tq is 
changed because of postmuitipiication by F 1 ; since A 1 is symmetric, we ha ve u 1 = i>i. 
The changes to the elements denoted w have been affected by both premultiplication 
and postmuitipiication. Also, since X is the first column of A, equation (10) implies 
that «i = —5. 

The second Householder tr ansformation is applied to the; matrix A \ defined in (18) 
and is denoted P 2 AP 2 , where P 2 is constructed by applying Corollary 11.3, with the 
vector X being ihe second column of the matrix A], The form of P 2 is 

"1 0 0 0 0 " 

0 10 0 0 

(19) Pi= 0 0 p p p , 

0 0 p p p 

0 0 p p p 

where p stands for some element in Pj. The 2x2 identity block in the upper-left 
comer ensures that the partial tridiagonalization achieved in the first step will not be 
altered by the second transformation P 2 A] P 2 . The outcome of this transformation is 



Tire elements u 2 and vz were affected by premultiplication and postmuitipiication 
by P 2 . Additional changes have been introduced to the other elements w by the trans¬ 
formation. 

The third Householder transformation, P 3 A 2 P 3 , is applied to the matrix A 2 de¬ 
fined in (20), where the corollary is used with X being the third column of Aj. The 
form of P 3 is 

"1 0 0 0 0 " 

0 10 0 0 

Pj = 0 0 1 0 0 . 

0 0 0 p p 

0 0 0 p p_ 


( 21 ) 








Again, the 3 x 3 identity block ensures that P 3 A 2 P 3 does not affect the elements 
of Az, which lie in the upper 3x3 comer, and we obtain 

an V] 0 0 0 

14 ] W\ V 2 0 0 

(22) P 3 A 2 P 3 = 0 U 2 u>2 V 3 0 -A 3 . 

0 0 uj w w 

0 0 0 w w 

Thus it has taken three transformations to reduce A to tridiagonal form 

For efficiency, the transformation P AP is not performed in matrix form. The next 
result shows that it is more efficiently carried out via some c lever vector manipulations. 

Theorem 11.24 (Computation of One Householder Transformation). If P is a 

Householder matrix, the transformation PAP is accomplished as follows. Let 

(23) V = AW 
and compute 

(24) c - W'V 
and 

(25) Q = V - cW. 
r rhen 

( 26 ) PAP = A - 2W Q! -2QW'. 

Proof. First, form the product 

AP = A(I - 2 WW>) — A — 2 AWW'. 

Using equation (23), this is written as 

(27) AP = A- 2VW'. 

Now use (27) and write 

( 28 ) PAP ~ (/ - 2WW')(A - 2 VW'). 

When this quantity is expanded, the term 2(2 WW'VW r ) is divided into two portions 
and (28) can be rewritten as 


(29) 


PAP = A — 2 W(W'A) + 2W(W'VW') - 2VW' + 2W{W ! V)W' 
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Under the assumption that A is symmetric, we can use the identity (W'A) = (if A' ) = 
V'. The tricky part is to observe that {W'V) is a scalar quantity; hence it can commute 
freely about in any term. Another scalar identity, W'V = (W'V)', is used to obtain 
the relation W'VW 1 = (W'V)W' = W'(W'V) = WfW'V)' = {(W'V)WY = 
{W'VWy. ITiese results are used in the terms of (29) in parentheses to get 

(30) PAP - A - 2WV' 4- 2W(W'VW)' - 2VW' + 2 W'VWW'. 

Now the distributive law is used in (30) and we obtain 

(31) PAP — A — 2W(V' - {W'VW)') - 2(V - W'VW)W ( . 

Finally, the definition for Q given in (25) is used in (31) and the outcome is equa¬ 
tion (26), and the proof is complete, • 


Reduction to Tridiagonal Form 

Suppose that A is a symmetric n x n matrix. Start with 

(32) A 0 = A. 

Construct the sequence Pi, P2, ■ ■., P n -1 of Householder matrices, so that 

(33) At = P k A k -\P k for k = 1, 2. n-2, 

where A k ha.s zeros below the subdiagonal in columns 1, 2, ..,, k. Then A „_2 is a 
symmetric tridiagonal matrix that is similar to A. This process is called Householder's 
method. 
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The constant c = W* V is then found to be 

c = -0.9. 

Then the vector Q ~ V - cW = V + 0.9 TV is formed: 

Q = _Lf0.000000 -7.500000 13.800000 9.900000] 

- [0.000000 -1.369306 2.519524 1.807484]. 

The computation A\ — Aq — 2 W Q f — 2 QW r produces 

4.0 -3.0 0.0 0.0‘ 

-3.0 2.0 -2.6 -1.8 

1 0.0 -2.6 -0.68 -1.24 ‘ 

0.0 -1.8 -1.24 0.68_ 

The final step uses the constants S = -3.1622777, R = 6.0368737, c = -1.26491J1 and 
the vectors 


W'= [0.000000 0.000000 -0.954514 -0.298168], 
V' = [0.000000 0.000000 1.018797 0.980843], 

£' = [0.000000 0.000000 -0.188578 0.603687]. 

The tridiagonal matrix A 2 = A\ — 2 WQ f — 2QW' is 


4.0 

-3.0 

0.0 

0.0' 

3.0 

2.0 

3.162278 

0.0 

0,0 

3.162278 

-1.4 

-0.2 

0.0 

0.0 

-0.2 

1.4 


Program 11.4 (Reduction to Tridiagonal Form). To reduce the n x n symmetric 
matrix A to tridiagional form by using n - 2 Householder transformations. 

function T=house (A) 


%Input - A is an nxn symmetric matrix 
’/.Output - T is a tridiagonal matrix 
Cn.nJ^sizeCA); 


for k=l:n-2 
‘/■Construct W 
s=nonn(A(k+l:n,k)); 
if (ACk+l.kXO) 
s=-s; 

end 

r=sqrt (2*s*(A(k+l ,k)+s.)) ; 
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W(1:k)“zeros(1,k); 

W(k+l)=(A(k+l,k)+s)/r; 

W(k+2:n)=A(k+2:n,k)Vr; 

'/Construct V 
V(1:k)“zeros(l t k); 

V(k+l:n)=A(k+l:n t k+l:n)*W(k+l:n)*; 

‘/Construct Q 
c=W(k+l:n)*V(k+l:n) 1 ; 

Q(1:k)“zeros(1,k); 

Q(k+1:n)=V(k+1:n)-c*W(k+1:n); 

‘/.Form Ak 

A(k+2:n,k)“zeros(n-k-l,1); 

A(k,k+2:n)“zeros(l,n-k-l); 

A(k+l,k)=-s; 

A(k,k+l)=-s; 

A (k+1:n,k+l:n)=A(k+l:n,k+l:n) .. . 

-2*W(k+l:n) J *Q(k+l:n)-2*Q(k+l:n)’*W(k+l:n); 

end 

T=A; 

The QR Method 

Suppose that A is a real symmetric matrix. In the preceding section we saw how 
Householder’s method is used to construct a similar tridiagonal matrix. The QR 
method is used to find all eigenvalues of a tridiagonal matrix. Plane rotations similar 
to those that were introduced in Jacobi’s method are used to construct an orthogonal 
matrix Q\ — Q and an upper-triangular matrix U\ = U so that A\ = A has the 
factorization 

( 34 ) A] = Q X U i. 

Then form the product 

(35) A 1 = U l Q x . 

Since Q x is orthogonal, we can use (34) to see that 

(36) Q\A X = £',£,£/] = V X . 

Therefore, Aj can be computed with the formula 

(37) A 2 =Q\A V Q\. 

Since Q\ — , it follows that A 2 is similar to A\ and has the same eigenvalues. In 

general, construct the orthogonal matrix Q k and upper-triangular matrix Up so that 

A* = Q k U k . 


(38) 
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(39) A* + 1 = U k Q k = Q' k A k Q k . 

Again, we have Q' k = Q k l , which implies that Ajt+i and A* are similar. An important 
consequence is that At is similar to A and hence has the same structure. Specifically, 
we can conclude that if A is tridiagonal then A k is also tridiagonal for all k. Now 
suppose that A is written as 

d\ e\ 
e\ d 2 e 2 
ei 

(40) A — . 

: d n —2 en-2 

e n ~2 d H -i e„~i 
e n -\ d n 

We can find a plane rotation P n -i that reduces to zero the element of A in location 


P„- l A = 


e\ di e 2 
ei d$ 


: d n -2 q n - 2 rii -2 
e n -2 Pn —1 1 

0 Pn _ 


Continuing in a similar fashion, we can construct a plane rotation P n 2 that will 
reduce to zero the element of P n - 1 A located in position (n — 1, n — 2). After n — 1 
steps we arrive at 


P l -.P n - 1 A = 


0 p2 qi 

0 0 pz 


x I * - ■ ,, 

■ qn~ 3 r B -3 

Pn- 2 qn- 2 r n - 2 

0 p n -1 q n -1 

0 0 Pn 

Since each plane rotation is represented by an orthogonal matrix, equation (42) implies 
that 


Q = K-iK-i- 
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Direct multiplication of U by Q will produce all zero elements below the lower 
second diagonal. The tridiagonal form of A 2 implies that it also has zeros above the 
upper second diagonal. Investigation will reveal that the terms r } are used only to 
compute these zero elements. Consequently, the numbers [rj } do not need to be stored 
or used in the computer. 

For each plane rotation Pj it is assumed that we store the coefficients c. } and sj 
that define it. Then we do not need to compute and store Q explicitly; instead we 
can use the sequences {c j } and {j j } together with the correct formulas to unravel the 
product 

(44) A 2 = UQ = U P f n _ [ P f n _ 2 ...p\. 

Acceleration Shifts 

As outlined above the Q R method will work, but convergence is slow even for ma¬ 
trices of small dimension. We can add a shifting technique that speeds up the rate of 
convergence. Recall that if kj is an eigenvalue of A then kj - j,- is an eigenvalue of 
the matrix B - A — j,-/. This idea is incorporated in the modified step 

( 45 ) Ai-Sil — U[L ,; 

then form 

(46) A i+l = ViQi for / = 1,2. kj, 

where {$,-} is a sequence whose sum is kj ; that is, kj = si -f s 2 H-f 

At each stage the correct amount of shift is found by using the four elements in the 
lower-right comer of the matrix. Start by finding and compute the eigenvalues of 
the 2x2 matrix 

(47) P"" 1 

le n -i d n J 

They are X] and x 2 and are the roots of the quadratic equation 

(48) x 2 - (d n - 1 + d n )x + d n ..\d n - e n -\e n -i = 0. 

The value j,- in equation (45) is chosen to be the root of (48) that is closest to d n . 

Then QR iterating with shifting is repeated until we have e n ~\ 0. This will 
produce the first eigenvalue A j — s\+ s 2 T • ■ - + s kl . A similar process is repeated with 
the upper n - 1 rows to obtain e n -2 ^ 0 , and the next eigenvalue is k 2 . Successive iter¬ 
ation is applied to smaller submatrices until we obtain e 2 0 and the eigenvalue A„_ 2 . 
Finally, the quadratic formula is used to find the last two eigenvalues. The details can 
be gleaned from the program. 
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Example 11.9. Find the eigenvalues of the matrix 


4 2 2 1" 

M 2-311 
M= 2 ! 3 1 ' 

_1 1 1 2 _ 

In Example 11. 8 , a tridiagonal matrix A\ was constructed that is similar to M. We start 
our diagonalization process with this matrix: 


, -3 2 3.16228 0 

1 0 3.16228 -1.4 -0,2 ' 

0 0 -0.2 1.4 

The four dements in the lower right comer are d 3 = —1.4, d 4 == 1.4, and -0.2 and 
are used to form the quadratic equation 

j 2 -(-1.4+1 A)x + (-1.4)(1.4) - (—0.2)(—0.2) = * 2 - 2 = 0. 

Calculation produces the roots xi = -1.41421 and x 2 = 1.41421. The root closest to d 4 
is chosen as the first shift si = 1.41421, and the first shifted matrix is 


A\ —s\I = 


2.58579 -3 0 0 

-3 0.58579 1.74806 0 

0 1.74806 -2.81421 -1.61421 

0 0 -1.61421 -0.01421 


Next, the factorization A\ - s\I == Q Y V 1 is computed: 

—0.65288 -0.38859 -0.55535 0.338141 
0.75746 -0.33494 -0.47867 0.29145 


L 0 0 0.52006 0.85413j 

-3.96059 2.40235 2.39531 0 

0 3.68400 -3.47483 -0.17168 

0 0 -0.38457 0.08024 

0 0 0 -0.06550 

Then the matrix product is computed in the reverse order to obtain 


Ai — U\ Qi = 


4.40547 2.79049 0 0 

2.79049 -4.21663 -0.33011 0 

0 -0.33011 0.21024 -0.03406 ’ 

0 0 -0.03406 -0.05595 


The second shift is $2 = —0.06024, the second shifted matrix is A% — s 2 I = Q 2 U 2t and 


A 3 = f/ 2 0 2 = 


4.55257 -2.65725 0 0 

-2.65725 -4.26047 0.01911 0 
0 0.01911 0.29171 0.00003 

0 0 0.00003 0.00027 
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The third shift is S 3 = 0.00027, the third shifted matrix is A 3 - s 3 I = Q 3 U 3 , and 


Aa = £ 7 3 03 = 


4.62640 2.53033 0 0" 

2.53033 -4.33489 -0.00111 0 
0 -0.00111 0.29150 0 ■ 

0 0 0 0 


t he first eigenvaiue, rounded to 5 decimal places is given in the calculation 

*1 = s i +J2+J3 = 1-41421-0.06023 + 0.00027= 1,35425. 

Next A] is placed in the last diagonal position of A 4 and the process is repeated, but 
changes are made only in the upper 3x3 comer of the matrix 

“4.62640 2.53033 0 0 

A = 2.53033 -4.33489 -0.00111 0 

4 0 -0.00111 0.29150 0 

.000 1.35425 


In a similar manner one more shift reduces the entry in the seconi 
column to zero (to ten decimal places): 

J4 =0.29150, Aa-s 4 I=Q 4 U 4 , A 5 = U 4 Q 4 . 
Hence the second eigenvalue is 


rsd mw anrl fFir.-i 


*2 = a I + J4 = 1.35425 + 0.29150 = 1.64575. 

Finally, X 2 is placed on the diagonal of A 5 in the third row and column to obtain 

T 4.26081 -2.65724 0 0 1 

„ I -2.65724 -4.55232 0 0 | 


0 0 1.64575 0 I 1 

L 0 0 0 135425J 

The final computation requires finding the eigenvalues of the 2 x 2 matrix in the upper-lef 
comer of A 5 . The characteristic equation is 

x 2 - (-4.26081+4.55232)* + (4.26081)(-4.55232) - (2.65724X2.65724) = 0 , 


which reduces to 


x 2 + 0.29151* - 26.45749 = 0. 


The roots are *i = 5.00000 and x 2 — -5.29150, and the last two eigenvalues are computed 

with the calculations 

A 3 = X 2 +*j = 1.64575 + 5.0000 = 6.64575 


A 4 - A 2 + *2 = 1.64575 - 5.29150 = -3.64575. 
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Program 11.5 can be used to approximate all the eigenvalues of a symmetric tridi¬ 
agonal matrix. The program follows directly from the previous discussion, but with 
two notable exceptions. First, the MATLAB command eig is used to find the roots 
of the characteristic equation (48) of each 2x2 submatrix (47). Second, the QR 
factorization of the matrix A t - s t I (45) is executed using the MATLAB command 
[Q, Ft] =qr (B), which produces an orthogonal matrix Q and an upper-triangular matrix 
R, such that B=Q*R (readers will be asked to write their own QR factorization program). 


Program 11.5 (The QR Method with Shifts). To approximate the eigenvalues of 
a symmetric tridiagonal matrix A using the QR method with shifts. _ 

f react ion D=qr2(A,epsilonJ 

•/,Input - A is a symmetric tridiagonal nxn matrix 

y„ - epsilon is the tolerance 

7. Output - D is the nxl vector of eigenvalues 

“/(Initialize parameters 

[n,n]=size(A); 

m=n; 

D^zerosCn,1); 

B=A; 

while (m>l) 

while (abs(B(m,m-l))>=epsilon) 

((Calculate shift 
S=eig(B(m-l:ra)); 

[j ^J^mint [absCB(m,m)* [i il ,- S)] ); 

%qr factorization of B 
[Q ,U] =qr (B-S(k) *eye(m)); 

((Calculate next B 
B=U*q+Si (k) *eye(m) ; 

end 

‘(Place mth eigenvalue in A ( 51 * 01 ) 

A(1:m,1:m)=B; 

%Repeat process on the m-1 x m-1 submatrix of A 
m=m-l; 

B=A(l:m J l:m); 
end 

D : =diag(A); 
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Exercises for Eigenvalues of Symmetric Matrices 

1. In the proof of Theorem 11.23, carefully explain why Z is perpendicukir to W. 

2. If A is any vector and P = / — 2XX\ show that P is a symmetric malrix. 

3. Let X be tiny vector and set P = I - 2XX r . 

(a) Find the quantity P S P. 

(b) What additional condition is necessary in order that P be an orthogonal matrix? 


Algorithms and Programs 


In Problems 1 through 6 use: 

(a) Program 11.4 to reduce the given matrix to tridiagonal form. 

(b) Program 11.5 to find the eigenvalues of the given matrix. 


"3 2 

f 


4 3 

3 4 

2 3 

2 1 

3 2 

4 3 

1. 2 3 : 

;> 

2. 

1 2 

:5 


1 2 

3 4^ 

3.6 

4.4 

0.8 

-1.6 

-2.8“ 

4.4 

2.6 

1.2 

-0.4 

0.8 

4. 0.8 

1.2 

0.8 

-4.0 

-2.8 

-1.6 - 

-0.4 

-4.0 

1.2 

2.0 

-2.8 

0,8 

-2.8 

2.0 

L8_ 




U + J 

i = j 

5. A = [ay]. 

where 

Ciij = 

]u 



2.75 -0.25 -0.75 1.25" 

-0.25 2.75 1.25 -0.75 

-0.75 1.25 2,75 -0.25 

1.25 -0.75 -0.25 2.75 


and i, y = 1,2.30. 


6. A = [aij], where ajy 


cos (sin (i + j )) i = j 

i + U + j J 


and i, j = 1,2.40. 


7. Write a program to cany out the QR meihod on a symmetric matrix, 

8. Modify Program 11.5 to call your program from Problem 7 as a subroutine. Use this 
modified program to find the eigenvalues of the matrices in Problems 1 through 6. 



Appendix: 

An Introduction to MATLAB 


This appendix introduces the reader to programming with the software package MAT- 
LAB. It is assumed that the reader has had previous experience with a high-level pro¬ 
gramming language and is familiar with the techniques of writing loops, branching 
using logical relations, calling subroutines, and editing. These techniques are directly 
applicable in the windows-type environment of MATLAB. 

MATLAB is a mathematical software package based on matrices. The package 
consists of an extensive library of numerical routines, easily accessed two- and three- 
dimensional graphics, and a high-level programming format. The ability to quickly 
implement and modify programs makes MATLAB an appropriate format for exploring 
and executing the algorithms in this textbook. 

The reader should work through the following tutorial introduction to MATLAB 
(MATLAB commands are in typewriter type). The examples illustrate typical input 
and output from the MATLAB Command Window. To find additional information 
about commands, options, and examples, the reader is urged to make use of the on-line 
help facility and the Reference and User’s guides that accompany the software. 

Arithmetic Operations 

+ Addition 

Subtraction 

* Multiplication 

/ Division 

~ Power 

pi, e, i Constants 


603^ 
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Built-in Functions 

Below is a short list of some of the functions available in MATLAB. The following ex¬ 
ample illustrates how functions and anthmetiic operations ate combined. Descriptions 
of other available functions may be found by using the on-line help facility. 

abs(#) cos(#) exp(#) log(#) loglOC#) cosh(#) 

sin(#) tan(#) sqrt(#) floor(#) acos(#) tanh(#) 

Ex. »3*cos(sqrt(4.7)) 
ans - 

-1.6869 

The default format shows approximately five significant decimal figures. Entering the 
command format long will display approximately 15 significant decimal figures. 
Ex. »fonuat long 
3*cos(sqrt(4.7)) 
ans = 

-1.68686892236893 


Assignment Statements 

Variable names are assigned to expressions by using an equal sign. 

Ex, »a=3-floor(exp(2.9)) 
a= 

-15 

A semicolon placed at the end of an expression suppresses the computer echo (output). 

Ex. »b=sin(a); Note: b was not displayed. 

»2*b~2 

ans= 

0.8457 

Defining Functions 

In MATLAB the user can define a function by constructing an M-file (a file ending 
in .m) in the M-file Editor/Debugger. Once defined, a user-defined function is called 
in the same manner as built-in functions. 
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Ex. Place the function fun(x) = l + x — x 2 /4 in the M-file fun.m. In the 
Editor/Debugger one would enter the following: 
function y=fun(x) 
y=l+x-x.. ‘2/4; 

We will explain the use of “. shortly. Different letters could be used for the variables 
and a different name could be used for the function, but the same format would have 
to be followed. Once this function has been saved as an M-file named fun. m, it can be 
called in the MATLAB Command Window in the same manner as any function. 
>>cos(fun(3)) 
ans= 

-0.1782 

A useful and efficient way to evaluate functions is to use the f eval command. This 
command requi res that the function be called as a string. 

Ex. »feval('fun J ,4) 
ans= 

1 

Matrices 

All variables in MATLAB are treated as matrices or arrays. Matrices can be entered 
directly: 

Ex. »A= [1 2 3;4 5 6:7 8 9] 

A= 

12 3 
4 5 6 
7 8 9 

Semicolons are used to separate the rows of a matrix. Note that, the entries of the 
matrix must be separated by a single space. Alternatively, a matrix can be entered 
row by row. 

Ex. »A=[1 2 3 
4 5 6 
7 8 9] 

A = 

12 3 

4 5 6 
7 8 9 

Matrices can be generated using built-in functions. 

Ex. »Z=zeros (3,5); creates a 3 x 5 matrix of zeros 

»X=ones(3 ,5) ; creates a 3 x 5 matrix of ones 

»Y=0: 0,5:2 creates the displayed 1 x 5 matrix 

Y= 
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0 0.5000 1.0000 1.5000 2.0000 


»cos(Y) 

ans= 


creates a l x 5 matrix by taking the 
cosine of each entry of Y 


1.0000 0.8776 0.5403 0,0707 -0.4161 


The components of matrices can be manipulated in several ways 


Ex. > >A (2,, 3) select a single entry of A 

ans- 

6 

»A (1:2,2:3) select a submatrix of A 

ans= 

2! 3 
5 6 

»A ([1 3] , [1 3] ) another way to select a submatrix of A 
ans= 

1 3 
7 9 

>> A(2,2)=tan(7.8); assign a new value to an entry of A 


Additional commands for matrices can be found by using the on-line help facility or 
consulting this documentation accompanying the software. 


Matrix Operations 

+ Addition 

Subtraction 

* Multiplication 

Power 

Conjugate Transpose 
Ex. »B= [1 2;3 4]; 

»C=B f C is the transpose of B 

C= 

1 3 

2 4 

»3*(B*C)~3 3(BC) 3 

ans= 

13080 29568 
29568 66840 
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Array Operations 

One of the most useful characteristics of the MATLAB package is the number of func 
tions that can operate on the individual elements of a matrix. This was demonstrated 
earlier when the cosine of the entries of a 1 x 5 matrix was taken. The matrix oper 
ations of addition, subtraction, and scalar multiplication already operate elementwise, 
but the matrix operations of multiplication, division, and power do not. These three op¬ 
erations can be made to operate elementwise by preceding them with a period: . *, , /, 
and . It is important to understand how and when to use these operations. Array op 
erations are crucial to the efficient construction and execution of MATLAB programs 
and graphics. 

Ex. »A=[1 2;3 4] ; 

>>A " 2 produces the matrix product AA 

ans= 

7 10 
15 22 
»A. ~2 
ans= 

1 4 
9 16 

»cos(A./2) 
ans= 

0.8776 
0,0707 


squares each entry of A 


divides each entry of A by 2, then takes 
the cosine of each entry 

0.5403 
-0.4161 


Graphics 

MATLAB can produce two- and three-dimensional plots of curves and surfaces. Op¬ 
tions and additional features of graphics in MATLAB can be found in the on-line fa¬ 
cility and the documentation accompanying the software. 

The plot command is used to generate graphs of two-dimensional functions. The 
following example will create the plot of the graphs of y = co;s(x) and y = cos 2 (x) 
over the interval [0, it J. 

Ex. »x=0:0.1:pi- 
»y*cos (x) ; 

»z=cos(x) . ' 2 ; 

»plot(x ,y,x, 2 , 1 o J ) 

The first fine specifies the domain with a step size of 0.1. The next two lines define the 
two functions. Note that, the first three lines all end in a semicolon. The semicolon is 
necessary to suppress the echoing of the matrices x, y, and z on the command screen. 
The fourth line contains the plot command that produces the graph. The first two terms 
in the plot command, x and y, plot the function y = cos(x). The third and fourth 


terms, x and z, produce the plot of y — cos 2 {x). The last term, } o \ results in o f s 
being plotted at each point (x^, Zk) where Zk == cos 2 (x*}. 

In the third line the use of the array operation “. is critical. First the cosine of 
each entry in the matrix x is tafcen, and then each entry in the matrix cos (x) is squared 
using the . “ command. 

The graphics command f plot is a useful alternative to the plot command. The 
form of the command is fplot ( 'name*. [a,b] ,n). This command creates a plot of 
the function name .m by sampling n points in the interval [a, b\. The default number 
for n is 25. 

Ex, »fplot ('tanfa.’, [-2,2] ) plots y — tanh(x) over [—2, 2] 

The plot and plot3 commands are used to graph parametric curves in two- and three- 
dimensional space, respectively. These commands are particularly useful in the visu¬ 
alization of the solutions of differential equations in two and three dimensions 
Ex. The plot of the ellipse c(t) = (2cos(/), 3 sin(/)), where 0 < t < 2jr, is produced 
with the following commands: 

»t=Q:0.2:2*pi; 

»plot (2*cos(t) ,3*sin(t)) 

Ex. The plot of the curve c(t) = (2cos(f), f 2 , I/O, where 0.1 < t <4n, is pro¬ 
duced with the following commands: 

»t=0.1:G.l:4*pi; 

»plot3(2*cos(t) ,t. "2,1. /t) 

Three-dimensional surface plots are obtained by specifying a rectangular subset of the 
domain of a function with the meshgrid command and then using the mesh or surf 
commands to obtain a graph. These graphs are helpful in vi sualizing the solutions of 
partial differential equations. 

Ex. »x=-pi:0.1 :pi; 

»y=x; 

»[x, y] =nieshgrid Cx, y) ; 

»z=sin(cos(x+y)) ; 

»meshCz) 


Loops and Conditionals 


Relational Operators 

Equal to 

“= Not equal to 

< Less than 

> Greater than 

<= Less than or equal to 

>= Greater than or equal to 
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Logical Operators 


- 

Not 

(Complement) 

k 

And 

(True if both operands are true) 

1 

Or 

(True if either or both operands are true) 

Boolean Values 

1 

True 



rhe for, if, and while statements in MATLAB operate in a manner analogous to their 
counterparts in other programming languages. These statements have the following 
basic form: 

for (loop-variable = loop-expression) 
execu tab le- statements 

end 

if (logical-expression) 

executable-statements 

else (logical- expression) 

executable-statements 

end 


while (while-expression) 

executable-statements 

end 

The following example shows how to use nested loops to generate a matrix. The 
following file was saved as a M-file named nest.m. Typing nest in the MATLAB 
Command Window produces the matrix A. Note, when viev/ed from the upper-left 
comer, that the entries of the matrix A are the entries in Pascal ’s triangle. 

Ex, for i=l:5 

A(i,l)=l;A(l,i)“l; 
end 

for i=2:5 
for j=2:5 

A(i J j)=A(i>j-l)+A(i-l > j); 

end 

end 

A 

The break command is used to exit from a loop. 

Ex. for k=l:100 
x=sqrt(k); 

if ((k>10)&(x-floor(x)==0)) 
break 
end 


Appendix: An Introduction to MATLAB 


6X5 


end 

k 

The disp command can be used to display text or a matrix. 

Ex, n=10; 
k=0; 

while k<=n 
x—k/3; 

disp([x x‘2 x‘3]) 
k=k+l; 
end 

Programs 

An efficient way to construct programs is to use user-defined functions. These func¬ 
tions are saved as M-files. These programs allow the user to specify the input and 
output parameters. They are easily called as subroutines in other programs. The fol¬ 
lowing example allows one to visualize the effects of moding out Pascal’s triangle with 
a prime number, Type the following function in the MATLAB Editor/Debugger and 
then save it as an M-file named pasc. m. 

Ex, function P=pasc(n,m) 

‘/,Input -■ n is the number of rows 
7. - m is the prime number 

V,Output -■ P is Pascal's triangle 

for j=l:n 

P(j,l)= : l;PCl ( j)=l; 

end 

for k=2:n 
for j=2:n 

P(k,jl-remfPCk,j-1),m)+rem(P(k-l,j),m); 
end 
end 

Now in the MATLAB Command Window enter P=pasc (5,3) to see the first five rows 
of Pascal’s triangle mod 3. Or try P=pasc(17Ei .3); (note the semicolon) and then type 
spy CP) (generates a sparse matrix for large values of «). 

Conclusion 

Al this point the reader should be able to create and modify programs based on the 
algorithms in this textbook. Additional information on commands and information 
regarding the use of MATLAB on your particular platform can be found in the on-line 
help facility or in the documentation accompanying the software. 
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Some Suggested References 
for Reports 


Approximation of Functions (34, 44, 114. 149, 157, 161, 182} 

Band Systems of Equations {29, 35, 41, 128. 160. 192] 

Basic Splines (B-splines) [35, 96, 101. 149, 160] 

Calculus and Computers [13, 18, 36. 55, 110, 111, 120, 122. 134, 162, 176, 179] 
Choleski’s Factorisation [9. 29, 40, 41,51, 90, 97, 152, 153, 160] 

Condition Number of a Matrix. [9, 19, 29,40,41,57, 62, 74, 94, 96, 98, ! 0!, 117, 128, 
145, 152. 153. 160. 192] 

Differential Equation [7, 31, 33, 39,42, 99. 104, 136, 138, 152, 171, 173] 

Dynamical Systems {2, 17, 48, 164] 

Economization of Power Series f3, 9, 29, 41, 51,62, 76, 85, 88, 1 17, 153, 184j 
Engineering Usage of Numerical Methods [6. 17, 20. 31, 33, 39, 54, 59, 71,88. 9<. 
104, 131. 136, 141, 163. 174, 183, 190, 195] 

Error Propagation [4, 9.40,41.49, 51,78, 79. 81, 133, 142, 145, 153, 204] 
Extrapolation [ 19. 29. 35. 40, 41. 78, 117, 153] 

Fast Fourier Transform [25. 29, 33, 40. 51. 62. 79. 96. 98. 112, 136, 141, 145. 147. 
150. 152. 153, 155, 169.210] 

Floating-point Arithmetic [8, 9, 35, 40. 41, 51, 57, 62, 90, 101, 103, 128, 129. 142, 
153, 181, 184,2081 

Forward-difference Formulas [9, 29, 40, 41,51,76,78, 81, 85, 90, 94, 105, 117, 128, 
143, 145, 153, 181, 184[ 


Gauss-Jordam Method [29, 44,5!, 62, 79, 85,90, 117, 152] 

Hermite Interpolation f9,29, 40,41, 79, 81, 90, 92, 128, 153, 191, 193,2081 
Hexadecimal Numbers T8, 35, 51, 101, 142] 

Ill-conditioned Matrices [9, 19, 29, 40, 41, 47, 49, 62, 94, 101, 128, 145, 153, 192. 
197] 

Inverse Interpolation [9, 19, 29, 35, 41, 62, 81, 128, 153, 166, 181, 191 ] 

Iterated Interpolation [29, 78, 81,90, 126, 128, 129, 181, 184, 208] 

Iterative Improvement (Residual Correction) [8, 9, 19, 29, 40, 41,49, 51, 58, 72, 90. 

94, 96,97, 117, 137, 152, 153, 160] 

Least Squares [39, 92, 109, 112, 152] 

Legendre Polynomials [9, 29. 40, 41,75, 152, 153J 

Linear Programming (Simplex Method) [ 19, 27, 35, 37, 41, 44, 50, 53, 79, 83, 94, 
104, 115, 135, 152, 153, 154, 165. 169] 

I .inear Systems [61,66, 74, 82, 152, 159| 

Ixiss of Significance (Cancellation) [3, 8. 35, 40, 79, 142] 

Mathematical Modeling 115,17, 22, 23, 32. 39, 42, 64, 72, 83, 95, 98, 102, 104, 107, 
113, 115, 116. 131. 135, 136, 190] 

Monte Carlo Methods [35.41,57,76, 83, 87,98,112, 115, 135, 152, J54] 

Multiple Integrals [29. 62, 67. 85.96, 112. 117, 152, 153] 

Newton-Cotes Formulas [9. 29. 62. 76, 78, 81. 90, 94. 97. 105. 117. 126, 128. 152, 
153. 154. 160. 175. 193. 208] 

Norms of Vectors and Maurices [9, 19, 29, 40, 49, 62, 90, 94, 96, 101, 117, 128, 145, 
153.!92} 

Orthogonal Polynomials [9, 19, 29, 34, 40, 41,44, 76, 81, 90, 96, 126, 128, 143, 145, 
149, 152, 153, 169] 

Pivoting Strategies [9, 29, 35, 40, 41, 58, 79, 96, 101, 117, 128, 145, 146, 152, 153, 
160] 

Programming [12, 103, 119, 150, 151, 152] 

QR Algorithm {3, 9, 10, 19, 79, 40, 41,74, 85, 92, 97, IU4, 128, 152, 153, 169, 175, 
197,203] 

Ouasi-Newton Methods [29, 96, 97, 139,152, 1531 

Quotient Difference Algorithm [3, 29.62, 78, 79,86, 112, 152, 200] 

Relaxation Methods [19, 29, 40,41,62, 90, 139, 152, W, 207] 

Remes Algorithm [9. 19. 56, 88, 128, 149, 152, 153] 

Round off Errors [4, 9, 29.35,41,51,76, 79,81,90,94, 101,117, 128,146, 153,160, 
181, 184, 186, 204J 

Scientific Computing [5, 71,98, 103, 150, 151, 152, 158,159, 160] 
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Secant Method (Convergence) [9,35, 40,41, 153, 160] 

Software for Numerical Analysis [32, 52, 82, 84,95, 97, 98, 124, 125, 150,151, 152, 
158,159, 160, 178] 

SOR Method [10,29,40,41,49, 137,139,152, 160,175,199,207] 

Stability of Differential Equations [3, 8, 9, 29, *40, 60, 76, 78, 79, 96, 101, 128, 146, 
152,153,160] 

Step-size Control far Differential Equations [29,40,60,75,101, 117,160] 

Stiff Differential Equations [9,29,40, 57,60, 98, 117, 152, 153, 160, 173] 
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Answers to Selected Exercises 


Section 1.1 Review of Calculus 

1. (a) L = 2 , (e„} = f ^ j, = 0 

3. (a) c = 1 - V2 

4. (a) Mi » -5/4, M 2 = 5 

5. (a) c = 0 

6. (a) c = 1 

7. c = 4/3 

9, (a) x 2 cds(x) 

10. (a) c = ±^13/3 

11, (a) 2 (b) 1 

15. 1 3,t/3, apply the Mean Value Theorem for Integrals 

16. Let the n roots of P(x) be xo- xi, • • •. xn-i- Venfy that the hypotheses of the 
Generalized Roile’s Theorem are satisfied. Therefore, there exists c e ( a,b ) 
such that P ( ' ,_J) (c) = 0. 

Section 1.2 Binary Numbers 

1, (a) The computer's answer is not 0 because 0.1 is not an exact binary fraction, 
(b) 0 (exactly) 
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2. (a) 21 «:) 254 

3. (a) 0.84375 (c) 0.6640625 

4. (a) 1.4140625 

5. (a) V2- 1.4140625 = 0.000151062.., 

6. (a) 1011 llwo ic) 10111 IOIOtwc 

7. (a) 0.01 H tW[ > (c) 0.101 il tw0 


9. (a) 0.006250000 ... 

11. Use c — and r = ^ to get S = = ? 

13. (a) i ^ O.lOilrwo x 2” 1 = 0.1011^0 x2~ ] 

1 % D.llOltwo x 2“ 2 = O.OllOlcwo x2"' 

^ 0.10001 l two x 2“° 

^ ^ 0.100l two x 2”° = O.lOOlwo x 2° 

\ as 0.101 l,wo x 2” 2 = 0.00101 ltwo x 2"° 

Tii 0.1011 Jl two X 2 c 

14. (a) 10 = 10W e (c) 421 == 120121^ 

15. (a) J=G.W (b) J=0.!*„* 

16. (a) 10 = 20 flvr (c) 721 = t0341 Sve 

17. (b) ± = 0.2 fivc 


Section 1.3 Error Analysts 

1. (a) x = 2.71828182, x = 2.7182, (x - x ) =0.00008182, 

(x - x )/x = 0.00003010, four significant digiis 

,11 1 1 292,807 

2 -; + ^ + 5^ = rra - n - 2 «3074428 = * 

p - P = 0.0000000178, ( p - p)!p = 0.0300000699 

3. (a) Pi ~ P 2 - 1.414 + 0,09125 = 1.505, P] p 2 = (1.414){0.09125) = 0.1290 

4. The error involves loss of significance. 

, , 0.70711385222 — 0.70710578119 0,00000707103 , 

w -oooool-= 0.00000. = a - 707103 

5. (a) ln(U + 2)/x) or ln( 1 + !/x) (c) cos(2x) 
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6. (aj P(2.72) = (2,72) 3 — 3(2.72} 2 + 3(2.72) — 1 = 20.12 — 22.19 + 8.16 - 1 

= -2.07 - 8.16 - 1 = 6.09 - 1 = 5.09 
£>(2.72) = ((2.72 - 3)2.72 + 3)2.72 - 1 = ((-0.28)2.72 + 3)2.72 - 1 
= (-0.7616 + 3)2.72 - 1 = (2.238)2.72 - 1 = 6.087 - 1 
- 5.087 

*(2.72) = (2.72 - l) 3 = (1.72) 3 = 5.088 

7. (a) 0.498 (b) 0.499 

1 n- 

9. (a) --- - cos(h) =2 + h + —h 3 + 0{h*) 

1 — ft 2 

(b) !-rcos(ft) =1+1 j + ?-+~ + 0(h 4 ) 

1 — ft 2 2 

Section 2*1 Iteration far Solving x = g{x) 

1* (a) g e C[ 0, 1], g maps [0, 1] onto [3/4, 11 c [0, 1], and | g'(xy -\- x /2\ = 
x/2 < 1/2 < 1 on |0, 11. Therefore, the hypotheses of Theorem 2.2 are satisfied 
and g has a unique fixed point on [0, 1 ]. 

2. (a) g(2) - -4 + 8 - 2 = 2, g(4) = -4 f- 16 - 8 = 4 

(b) po = 1 .9 Eq^OA R$ 0.05 

Pi = 1-795 £,= 0.205 rt, =0.1025 

P7 = 1.5689875 E 2 = 0.4310125 R z =. 0.21550625 

Pi - 1.04508911 £ 3 = 0.95491089 = 0.477455444 

(e) The sequence in pan (b) does not converge to P = 2. The sequence in part (c) 


4. P = 2, g'(2) = 5, iteration will no: converge to P = 2 

B. P - 2 /i7z where n is any integer, g{P) = 1; Theorem 2,3 gives no information 
regarding convergence. 

9. (a) g(3) = 0.5(3) + 1.5 = 3 

(c) Proof by mathematical induction. If n = 1, then \ P - p\ \ = \ P - po\/2\ 
by part (b). Induction hypothesis: Assume that j P - p*| = \P - p o; /2*. Show 
statement is true for n = k + 1: 


IP “PA + ll = IP - Pk /2 

= i,P-P0 1/2‘)/2 
= IP-p 0 U2 k+ \ 


(by part (b)) 

(induction hypothesis) 


10.. (a) 


IP+t-t -P*l 
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Section 2.2 Bracketing Methods for Locating a Root 

1. / 0 = (0.11+0.12)/2 =0.115 A(0.115) = 254,403 

h = (0.11 + 0.115)/2 = 0.1125 A(0.1125) = 246,072 

h - (0.1125 + 0.115)/2 = 0.11375 +(0.11375) = 250,198 

3. There are many choices for intervals [a, b\ on which / (a) and / (b) have opposite 
sign. The following answers are one such choice. 

(a) /(l) < 0 and /(2) > 0, so there is a root in (1, 2]; also /(-l) < 0 and 
/(—2) > 0, so there is a root in [—2, —11. 

(c) /(3) < 0 and /(4) > 0, so there is a root in [3, 4], 

4. c 0 = -1.8300782, c\ = -1.8409252, c 2 = -1.8413854, c 3 = -1,8414048 
6. c 0 = 3.6979549, c { = 3.6935108, c 2 = 3.6934424, c 3 = 3.6934414 

11. Find N such that < 5 x 10 -9 . 

14. The bisection method will never converge (assuming that c n ^ 2) to x = 2. 

Section 2.3 initial Approximation and Convergence Criteria 

1. There is a root near t = -0.7. The interval [-1,0] could be used. 

3. There is a root near x = l. The interval [-2, 2] could be used. 

5. There is one root near x = 1.4. The interval [1, 2] could be used. There is a 
second root nearx = 3. The interval [2, 4] could be used. 

Section 2.4 Newton-Raphson and Secant Methods 

l.(a )w = 

(b) po = -1.5, p\ = 0.125, p 2 = 2.6458, p 3 = 1.1651 

3. (a) p k == g(Pk-i) ~ \pk-\ + 5. 

(b) po == 2.1, pi = 2.075, p 2 = 2.0561, p 3 = 2.0421, p A = 2.0316 


5. (a) p k - g(pt-i) — Pk l + cos(p*_i) 
7. (a) = pl~\/(pk-\ - 1) 


(b) po ~ 0.20 

(c) po = 20.0 

O 

o 

1 

!! 

pi =21.05263158 

p 2 = -0.002380953 

p 2 = 22.10250034 

p 3 = -0.000005655 

p 3 =23.14988809 

p 4 = -0.000000000 

p 4 = 24.19503505 

lim pk = 0.0 

lim pk = oc 


n-*oc n— 


8. po = 2.6, p | = 2.5, p 2 = 2.41935484, p 3 = 2.41436464 
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9. Solution of cos(x) —1=0. 


n 

pn Steffensen’s 

0 

0.5 

1 

0.24465808 

2 

0.12171517 

3 

0.00755300 

4 

0.00377648 

5 

0,00188824 

6 

0.00000003 


11. The sum of the infinite series is S — 99. 


J 

n 


T n 

i 

0.99 

98.9999988 

2 

1.9701 

99.0000017 

3 

2.940399 

98.9999988 

4 

3.90099501 

98.9999992 

5 

4.85198506 


6 

5.79346521 



13. The sum of the infinite series is S — 4. 
15. Muller's method for /(x) = x 3 — x — 2. 


n 

Pn 

f\Fn > 

0 

1.0 

-2.0 

1 

1.2 

-1.472 

2 

1.4 

-0.656 

3 

1,52495614 

0.02131598 

4 

1,52135609 

-0.00014040 

5 

3.52137971 

-0.00000001 


Section 3.1 introduction to Vectors and Matrices 

1. (i)(a) (1.4) (b) (5,-12) (c) (9,-12) (d) 5 (e) (-26,72) 

(f) -38 (g) 2V1465 

2. & = arccos(—16/21) rs 2.437045 radians 

3. (a) Assume that X, Y ^ 0. X Y = 0 iff cos(6>) = 0 iff 9 = (2n + l)f iff X 
and y are orthogonal. 


6 . (c) aji = 


Ji 

J ’ ji + i 


j=i 

j*i 


U i = j 

i ~ ij +J i 


Section 3.2 FToperties of Vectors and Matrices 


1. AB 




-11 -12 
13 -24 


-15 10 

-12 -20 


3. (a) ( AB)C = A(BC) ■ 


BA = ^ 

i 2 “ 5 ! 

j_—88 —56 j 


5. (a) 33 (c) The determinant does not exist because the matrix is not square. 

8. = AiBB-^A - 1 = ( AI)A~ X == AA~ l = 7. Similarly, 

(B'U-'KAB) = I. Therefore, (Afi)" 1 = B~ l A" 1 . 


10. (a) MN (b) M{N ~\) 

1 

14. XX f = [6], X X = ' 


1 

-1 1 
2 -2 


2 

-2 

4 


Section 3.3 Upper-triangular Linear Systems 

1. a' i — 2, X 2 — —2, xi = I, X 4 = 3, and det A = 120 
5. jci — 3, X 2 = 2, *3 = 1, X 4 — — 1, and det A = —24 


Section 3.4 Gaussian Elimination and Pivoting 

L jc i = — 3 t X2 = 2, x 3 = 1 

5. y = 5 — 3x + 2x 2 
10. xi = 1, X 2 — 3, X 3 = 2, xa = —2 

15. (a) Solution for Hilbert matrix A: 

xi = 25, jr 2 = -300, x 3 = 1050, x 4 = -1400, x 5 = 630 
(b) Solution for the other matrix A: 
xi = 28.02304, *2 = -348.5887, x 3 = 1239.781 
x 4 = -1656.785, x 5 = 753.5564 

Section 3.5 Triangular Factorization 

1. (a) Y' = [-4 12 3], X' = [-3 2 l] 

(b) Y* = [20 39 9], X' = [5 7 3] 



“5 2 -1" 


o 

o 

"-5 2 f 

3. (a) 

1 0 3 


—0.2 1 0 

0 0.4 2.8 


3 1 6 


-0.6 5.5 1_ 

0 0 -10_ 

5. (a) Y' = [8 -6 12 2] 

7 

II 

>4 

1 2] 


(b) Y' = [28 6 12 1], X' = [3 1 2 l] 
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6* The triangular factorization A = LU is 


-3 -1 —1.75 1 


Section 3,6 Iterative Methods for Linear Systems 


0 -4 -10 

0 0 -7.5 


1. (a) Jacobi iteration 
J»i =(3.75,1,8) 

P 2 = (4.2, 1.05) 

P 3 - (4.0125, 0.96.) 

Iteration will converge to (4, 1). 

3. (a) Jacobi iteration 
Pi = (-1, -1) 

P 2 = {-4, -4} 

P 3 = (-13,-13) 

The iteration diverges away 
from the solution P = (0,5, 0.5). 

5. (a) Jacobi iteration 


(b) Gauss-Seidel Iteration 
Pi = (3.75, 1.05) 

P 2 = (4.0125,0.9975) 

P 3 = (3.999375, 1,000125) 
Iteration will converge to (4, 1). 
(b) Gauss-Seidel iteration 
Fl =<“ 1,-4) 
p 2 = (-13, -40) 

P 3 = (-121, —361) 

The iteration diverges away 
from title solution P = (0,5, 0.5). 


Pi = (2, 1.375.0.75) 

P 2 = (2.125, 0.96875, 0.90625) 

P 3 = (2.0125, 0.95703125, 1.0390625) 
Iteration will converge to P = (2, 1, 1), 

(b) Gauss-Seidel iteration 
Pi = (2,0.875, 1.03125) 

P 2 = (1.96875, 1.01171875,0.989257813) 
P 3 = (2.00449219,0.99753418, 1.0017395) 
Iteration will converse to P — (2. 1. 11. 


9. (15): IIATtl, =E£L[ 1**1 = 0 iff |**| = 0 fori = 0, 1. N iff X = 0 

(16): ||cX|h = |r**| = E*" = , |c|l**l = |c| Ef=i 1**1 = kl IIA||, 


Section 3.7 Iteration for Nonlinear Systems 

1. (a) * = 0, y = 0 (c) x=0,y = 2mr 

2, (a) x = 4, y — -2 (c) x = 0, y = (2n + 1 )n(2 
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7. 0 = x 2 - y - 0.2, 0 = y 2 - x - 0.3 


J_Solution of the linear system: J(P k ) dP = ~F(P k ) P k 4. dP 



r 2 , 4 - -1.0] f-0.00756301 _ f 0.04] [ 1 . 192437 ] 

L-L.O 2.4] [ 0.0218487] “ [-0.06] [l.22l849j 

'2.384874 -1.0] [-0.0001278] _ [0.0000572] [1.192309] 

-1.0 2.443697] [-0.0002476] ' ~ [0.OOO4774J [l.22l60lj 


(a) Therefore, ( Pl ,q ,) = (U92437, 1.221849) and 
(P 2 , qi) = (1.192309,1.221601). 


Solution of the linear system: J(P k ) dP = -F(P k ) 


P k + dP 


-0.4 -r- 

-1,0 -I 


-1.0] ["—0.0904762] __ _ [ 0,04] [-0.; 

0.4][ 0.0761905]' [-0.06] [-0. 


-0.290476: 

-0.123809: 


[-0.29047621 


-0.5809524 -1.0] [0.0044128] __ [0 0081859] [-0.286063' 

-1.0 —0.247619o] [o.005622j] [0.0058Q50J [-0.118187: 


(b) Tlierefore, (pi, qi) = (-0,2904762, -0.1238095) and 
(P 2 ,q:i) = (-0.2860634, -0.1181872). 

8 . (b) Tlie values of the Jacobian determinant at the solution points; are J J (1, 1) | = 
0 and |7(-1, -l)f = 0. Newton’s method depends on being able to solve a 
linear system where the matrix is J(p„, q n ) and ( Pn ,q n ) is near a solution. For 
this example, the system equations are ill conditioned and thus h;ird to solve with 
precision. In fact, for some values near a solution we have J(xq, _y 0 ) = 0, for 
example, 7(1.0001, 1.0001) = 0. 

12. (a) Note: As with derivatives, we have ^(c/(jc, y)) - y). F{X) was 

defined as F(X) = [/j (xi,..., x n ) ■ ■ - f m (x 1 ,.,,, x n )] ; thus, by scalar multipli- 
cation, cF(X) = [c/i .. ,*„)■•■ c/„(*,,..., *„)]'. JtcF(X)) = U, k ] mx „, 

where ,/V* = (c/,Ui,..., x„)) — c ,..., x„). Therefore, by the def¬ 

inition of scalar multiplication, we have J(cF(X)) = eJ(F(X)). 
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Section 4.1 Taylor Series and Calculation of Functions 

1. (a) P s (x) = jc - jc 3 /3! -f x 5 /51 

Pi(x) = -v - x 3 /3! + jc 5 /S1 - jc 7 /7! 

P 9 U) = jc ~ x 3 /3i + x 5 /5! - x 1 fl\ + x 9 f9l 

(b) \Eg(x)\ = !sin(c)x l 0 /10!j < (1)(1) 10 /10! = 0.0000002755 

(c) PsU) = 2“ ,/2 (1 + (x- jt/4) - (x - tt/4) 2 /2 - (x - jt/4) 3 /6 

+ U - jt/4) 4 /24 + U - ?t/4) 5 /120} 

3. At xo = 0 the derivatives of /(jc) are undefined. But at xq = 1 the derivatives are 
defined. 

5. P 3 ( x) = \+0x- x 1 {2 + Ox 3 = 1 - x 2 /l 

8. (a) /(2) = 2, f'(2) = i, /"(2) = -±, /< 3 >(2) = ^ 

P 3 U) = 2 + (x - 2)/4 - (x - 2) 2 /64 + (jc - 2) 3 /5I2 

(b) n(i) = i.732421875; compare with 3 1/2 = i.732050808 

(c) / ( 4 ) U) = -15(2 + x) - 7/2 /16; the minimum of / ( 4 ) U)I on the interval 

I < jc < 3 occurs when jc = 1 and |/^(x)| < |/^( 1 )| < 3 _7 ^ 2 (15/16) 

0.020046. Therefore, |P 3 U)| < (Q 020046)(1) = 0.00083529 

4! 

I3„ (d) ^(0.5) = 0.41666667 14. (d) P 2 (0.5) = 1.21875000 

P 6 (0.5) = 0.40468750 A (0.5) = 1.22607422 

P 9 (0,5) = 0.40553230 P 6 (0.5) = 1.22660828 

In (1.5) = 0.40546511 (1.5 > I/2 = 1.22474487 

Section 4.2 Introduction to Interpolation 

L (a) Use jc = 4 and get b 5 == -0.02, bt = 0.02, b\ = -0.12, 6 0 - 1.18. Hence 
P(4) — 1.18. 

(b) Use x = 4 and get — -0,06, d\ = -0.04, do = -0.36. Hence P'(4) — 
-0.36. 

(c) Use jc = 4 and get 14 = —0.005, i 3 — 0.01333333, i 2 — —0.04666667, 

II = 1.47333333, i 0 = 5.89333333. Hence /(4) = 5,89333333. Similarly, use 
x = land get 7(1) = 1.58833333. 

/f P(x)dx == /(4) - 7(1) = 5.89333333 - 1.58833333 = 4.305 

(d) Use jc = 5.5 and get 63 = -0.02, bi = -0.01, b\ - -0.255, b 0 == 0.2575. 
Hence P(5.5)= 0.2575. 
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Section 43 1 Lagrange Approximation 

1. (a) Pi (jc ) = -1(jc - 0)/(—1 -0)-f0 = jc + 0 = jc 
(b) /*,<*) = (x+UC^-0) 

(-1 _ 0>(—1 - D (1 + 1)(1 - 0) 

= -0.5(x)(x - 1) + 0.5U)U + I) = Ojc 2 + x + 0 = jc 

(d) P, (JC) = 1(JC - 2)/(1 - 2) + 8(JC - l)/(2 - 1) = lx - 6 

5. (c) / (4 >(<;) = 120(c — 1) for all c; thus Ej(x) = 5U + 1)U)U-3)(jc-4){c-1) 
10. | f a Hc)\ < I - sin(l)( = 0.84147098 = M 2 
(a) h 2 M 2 / 8 = /i 2 (0,84147098)/8 < 5xl0" 7 
12, (a) z = 3 - 2jc + 4y 


Section 4.4 Newton Polynomials 

1 . PiU) == 4 — (jc — 1 ) 

P 2 (x) = 4 - U - I) + 0.4(jc - 1)(jc - 3) 

P 3 (x) == P 2 (x) + G.01U - 1)U - 3)U - 4) 

P 4 U) == P 3 U) - Q.002U - DU - 3)U - 4)U - 4.5} 

Pi(2.5) = 2.5, P 2 (2.5) = 2.2, P 3 (2.5) = 2.21125, P*<2.2) = 2.21575 
5. fix) = 3(2)" 

P 4 U) ==15 + 1.5U + 1) + 0.75U + 1)U) + 0.25U + 1)U)U - D 
+ 0.0625U + 1)U)U - DU -2) 

Pi Cl .5) = 5.25, P 2 (1.5) = 8,0625, P 3 (1.5) = 8.53125, P 4 (U5) = 8.47265625 
7. f{x) = 3.6/jc 

P 4 U) = 3.6 - 1.8U - D + 0.6U - DU - 2) - 0.15U - DU - 2)U - 3) 
+ 0.03U - DU - 2)U - 3)U - 4) 

Pi(2.5) = 0 . 9 , P 2 (2.5 ) = 1.35, P 3 (2.5) = 1.40625, P 4 (2.5) = 1.423125 


Section 4.5 Chebyshev Polynomials 

9. (a) InU + 2) as 0.69549038 + 0.49905042* - 0.14334605* 2 + 0.04909073jc 3 
(b) |/< 4 >U)|/(2 3 (4!)) < | - 6|/(2 3 (4!)) = 0.03125000 
11. (a) cos(jc) w ] - 0.4695 2087jc 2 

(b) |/ (3> U)l/(2 3 (3!)) < !sin(l)|/(2 2 (3!)) =0.03506129 
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13. The error bound for Taylor’s polynomial is 

^1 = 0.00002087. 

o' o! 

The error bound for the minimax approximation is 

(/ (S) (*)| I sin(1)f 

^<^±11 = 0.00000016. 

Section 4.6 Fade Approximations 

1. 1 = po, 1 + q\ = pi, X ~ 4 q\ = 0. q\ = - 1 -, p\ = ^ 
e*^R l , l (x) = (2+x)/(2-x) 

3. 1 = po, i + 2*1/15 = pi, ± + *,/3 = 0. q, = Pl = -1 
5. 1 = po. 1 + q\ ~ Ph ^ + <?1 + qi = P 2 * 


~ + y + ^2 = o 

First solve the system 

± + ii+ «=0. 

24 6 2 

rpi 1111 

Then q x = --, q2 = -, Pl = p 2 = - 


1 2 

7. (a) 1 = Po> ~ +qi = pi, — -h ^i/3 H- ^r 2 = pi¬ 
ll 2 gi a? 

-1- — + — = 0 

315 15 3 

First solve the system 

+ iZil | ^ A 

2835 315 15 

xu 41 II 

Section 5.1 Least-squares Liine 

1. (a) 10A + 0B = 7 

0A + 5£= 13 

y = O.lOx 4 2.60. E 2 (f) » 0.2449 

2. (a) 40A + OP = 58 

0A4 5P = 31.2 

y = l .45* 4 6.24, E 2 (f ) « 0.8958 

5 5 

3. (C) £ x k y k Y. x l = 86 - 9 / 55 = 1 58 

*=1 ' k= 1 

y = 1.58*, E 2 (f) & 0.1720 
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11. (a) y = 1.6866* , £ 2 (f) & 1.3 

y = 0.5902* 3 , E 2 (f) * 0.29. This is the best fit. 

Section 5.2 Curve Fitting 

1. (a) 164A 4 20C = 186 

205 = -34 

20A 4 4C= 26 

y = 0.875* 2 - 1.70* 4 2.125 = 7./8* 2 - 17/10* 4 17/8 

3. (a) 15A 4 SB - -0.8647 
5A 4 5S = 4.2196 
y = 3,8665<?-°- 5084 *, £[(/) 0.10 

6 .___ 

Using linearization Minimizing least squares 

1000 1000 

8 14 4.30l8e- loS02( r±4.213L- ] - 045, S 7 

5000 _ 5000 

' 1 4 8.9991e- a8I138( 1 4 8.9987e“ 0 sl 


18. (a) 14A 4 155 4 8C = 82 
15A 4 195 4 9C = 93 
8A4 9B 4 5C = 49 

A = 2,4, B = 1.2, C = 3.8 yields z = 2.4* 4 1.2y 4 3.8 


4. h 0 = I do= -2 

hi =3 di = l mi = 18 

h 2 = 3 d 2 = —2/3 u 2 = —10 

f T' m i +"12 = 21 


Solve Ihe system j 2 


to get mi = and m 2 = -fir 


- 21 togeiMi = ^ anam 2 =-tst 

3m i 4 #m2 = —15 11/1 1U1 


men mo = — fjjf and m 3 = §4§. me cubic spline is 


129 230 

Sq ( x ) = 101^ + ^ _ + 3 ) 2 “ + 3 ) + 2 -- 3 < * < -2 

s, W = -^t* + 2) 3 + +2) 2 - +2) — 2 <x< 1 

779 117 72 

S2 (x ) = mj( x-^--(x-lf + - ( x-l) + 3 1<,<4 
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5. /i 0 =l do = -2 

hi =3 di = 1 U] = 18 

= 3 d2 = —2/3 «2 — —10 

„ , . 18m i + 3 w 2 = 18 S2 j 134 

Solve the system { to get m i = § and m2 = - • 

j 3m 1 + 12m 2 — —10 

Set mo = 0 =: m3. The cubic spline is 

■So(*> = ^(* + 3 ) 3 -^(* + 3 ) + 2 — 3 <x < —: 

SiW = ~(^ + 2) 3 + ^U + 2> 2 -p(^ + 2) -2<-t<l 

ftU) = _ 1)3 - pt* - D 2 + pt* -D + 3 1<JI54 

5. Ao=l <4> = -2 

fti = 3 tfi = 1 «1 = 18 

/t2 = 3 = —2/3 «2 = — iO 

Solve the system f 3 ” 1 + l * 8 . to get m 1 = f§ and m? = —§. 

(Omi -f 18m 2 = -10 126 y 

Then mo = and m 3 = -The cubic spline is 



37 


187 7 

841 „ „ 



-Sot*) = 

~252 

U+3) 3 

+ 726^ +3) - 

-25 ( * + 3) + 2 

-3 < x 

< -2 


37 

■1 


17 


< 1 

Siix) = 


(x + 2) 3 

+ — ix + 2) 2 - 

* — (x +2) 

-2< x 

252 

ZDZ 

Zl 




37 


5 2 

125 . _ „ 

1 < X 


Siix) = 

""252 

(x - l) 3 


—d-i )+3 

< 4 


Section 5.4 Fourier Series and Trigonometric Polynomials 


1. /(*)-£( s m(*) + ^ + ^ + ^ + »-) 


3. /GO = f + E>1 (^^r 1 ) ““O'*) - £?-> (f^) shl0 '- t) 
12. /GO = 6 + 2| E“, (M£l) “ s (■¥) 
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Section 6.1 Approximating The Derivative 

1. /(*> = sin(ac) 

Bound for the 
truncation enor 

0.001274737 
0.0000i2747 
0.000000127 


Approximate fix), 
formula (3) 

0.695546112 

0.696695100 

0.696706600 


approximation 

0.001160597 
0.000011609 
0.030000109 


3. f(x) = siin(jr) 


h 

o7~ 

0.01 


Approximate fix), 
formula (10) 

0.696704390 

0.696706710 


Error in the 
approximation 

0.000002320 

- 0.000000001 


Bound for the 
truncation enor 

0.000002322 

0.000000000 


5. fix) = (a) f( 2) & 12.0025000 (b) /'(2) * 12.0000000 

(c) For part (a); 0(h 2 ) = -(0.05) 2 /< 3 >(c)/6 = -0.0025000. For part (b): 
0(h 4 ) = -(0.05) 4 / (3) (c)/30= —0.0000000 


7. fix, y ) = xy/ix 4- y) 

(a) fAx, y) = (y/ix + y))\ f x <& 3) = 0.36 


Approximation to 
fx. (2, 3) 


Error in the 
approximation 


0.360144060 


-0,000144060 


0.01 0.360001400 -0.000001400 

0.001 0.360000000 0.000000000 



10. (a) Formula (3) gives /'(1.2) « -13.5840 and £(12) ^ 11.3024. Formula (10) 
gives £(L2) -13.6824 and £(1.2) % 11.2975. 

(b) Using differentiation rules from calculus, we obtain /'(1.2) -13.6793 and 

£(L2) ^ 11,2976. 






1. f{x) = InU) 

(a) /"(5 ) - -0.040001600 (b) /"(5) « -0.040007900 

(c) /"(5) ss —0.039999833 (d) /"(5) = -0.04000000 = -1/5 2 

The answer in part (b) is roost accurate. 

3. fix) = LnU) 

(a) /"(5) ^ 0.0000 (b) /"(5) « -0.0400 

(c) /"(5) to 0.0133 (d) /"(5) = -0.0400 = -1/5 2 

The answer in part (b) is most accurate. 

5. (a) /(*) = x 2 , /"(l) *2.0000 

(b) fU) = x 4 , /"(1) « 12.0002 
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Section 7,1 Introduction ito Quadrature 


1. 

(a) 

fix) 

— sin(jrjr) 

trapezoidal rule 

0.0 





Simpson’s rule 

0.666667 





Simpson’s rule 

0.649519 





Boole’s rule 

0.636165 


(c) 

fix) 

— sin(V*) 

trapezoidal rule 

0.420735 





Simpson’s rule 

0.573336 





Simpson’s | role 

0.583143 





Boole’s rule 

0.593376 

2. 

(a) 

fix) 

= sin(^x) 

Composite trapezoidal rule 

0.603553 





Composite Simpson role 

0.638071 





Boole’s rule 

0.636165 


<b> 

fix) 

— sinC^/x) 

Composite trapezoidal rule 

0.577889 





Composite Simpson rule 

0.592124 





Boole’s rule 

0.593376 


Section 7.2 Composite Trapezoidal and Simpson’s Rule 

1. (a) F(x) = arctanU), F(l) - F(-l) = it /2 « 1.57079632679 

(i) : M = 10, k - 0.2, Tif, ft) = 1.56746305691, E T {f, ft) = 0.00333326989 

(ii) : M == 5, ft = 0.2, S(/, ft) = 1.57079538809, E 5 (/, ft) = 0.00000093870 

(c) FU) = F( 4) - F(4) = 3 

(i) : M = 10, ft = 0.375, T(/, ft) = 3.04191993765, 

E T {f, ft) = -0.04191993765 

(ii) : M = 5, ft = 0.375,5(/, ft) = 3.00762208163, £ s (/, ft) = -0.00762208163 

2. (a) f 0 l Vl+9x 4 dx = 1.54786565469019 

(i) : M = 10, T(f, 1/10) = 1.55260945 

(ii) : M == 5, S(/, 1/10) = 1.54786419 

3. (a) 2 tt ^ xVl +9* 4 <U = 3.5631218520124 

(i) : M = 10, T{f, 1/10) = 3.64244664 

(ii) : M == 5, 5(/, 1/10) = 3.56372816 

8. (a) Use the bound |/ (2) U)I = I - cos(jr)1 < | cos(0)| = 1, and obtain 

((tt/ 3 - 0)ft 2 )/12 <5x 10 -9 ; then substitute/! = jt/( 3M) and get tt 3 /162 x 
10 s < M 1 . Solve and get 4374.89 < M\ since M must be an integer, M = 4375 
and ft = 0.000239359. 

9. (a) Use the bound |/^U)| — |cosU)| < | cos(0)| — 1, and obtain 

{(rr/3—0)/t 4 )/180 < 5xl0~ 9 ;then substitute ft = n/( 6 M) andgei:^ 5 /34,992x 
10 7 < M 4 ; since M must be an integer, Af = 18 and. ft = 0.029088821. 
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h 

Tifh) 

£r(fh) = Oih 2 ) 

0.2 

0.1990008 

0.0006660 

0.1 

0.1995004 

0.0001664 

0.05 

0.1996252 

0.0006416 

0.025 

0.1996564 

0.0000104 

0.0125 

0.1996642 

0.0000026 


Section 7.3 Recursive Rules and Romberg Integration 

l, (a) --,---- 

J R(J. 0) R(J, 1) | H{J, 2) 

0 -0.00171772 

1 0.02377300 I 0 03220990 

2 | 0,60402717 1 0,79744521 0.84845691 

(0 ---, __ 

j r u, o) ; R(j f 2) 

0 j 2.88 

1 I 2.10564024 1.84752031 

2 j : .7:6167637 1.67368841 1,6620996 2 

10. (ii) For /J fx dx, Romberg integration converges slowly because the higher 
derivatives of the integrand /(a) — Jx are not bounded near x — 0, 


I. / 0 2 6r 5 d! = 64 (b) G(f, 2) = 58.6666667 

3. / p ! sin{t)/t dt ^ 0.9460S31 (b) Gif 2) = 0.9460411 

6, (a) N —4 <b) N = 6 

8. If the fourth derivative does not change too much, then I -——! < j 2l / J 

| [35 | J 90 | 

The truncation error term for the Gauss-Legendre rule will be jess than the trun¬ 
cation error term for Simpson’s rule. 

Section 8.1 Minimization of a Function 

3, (a) fix) = 4:r 3 - 8 a 2 - I Lx + 5; fix) = 12a 2 - 16a - 11; 
local minima at x = -£■ 

(d) f(x) = e x /x 2 -, fix) =s e x ix - 2)/* 3 ; local minima at a = 2 
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7. (a) /(a, y) = x 3 + v J - 3a - 3y 4- 5 

f (*, y) = 3 a 2 - 3, /v( a, yj = 3y 2 - 3 

Critical points: (1, 1), (1. — 1), (—1, 1), 1-1. -1) 

Local minimum at (1, il) 

(c) f(x,y) = x 2 y+xy 7 -3xy 

f x (a, y) = 2xy + y 1 ~ 3>\ f y (x, y) = a 2 + 2av - 3a 
C ritical points: (0, 0), {0, 3), (3, 0),(1, 1) 

Local minimum at (I, 1) 

Ii. “Reflecting” the triangle through the side ~BG implies that the terminal points 
of of the vectors W. Af. and R all lie on the same line segment. Thus, by the 
definition of scalar multiplication and vector addition, we have R -W = 2 (M — 
W)or R = 2 M- W. 

Section 9.1 Introduction to Differential Equations 

1 . (b) f. — 1 3. (b) L = 3 5. (b) L = 60 

10. (c) No, because f y ij, y) — iy’ 2/3 is not continuous when t = 0, 
and iim v _,.o /v(L y) — oc - 
13. y(t) — f 3 - cos(0 4- 3 
15, y(r) =/ 0 r c-^ 2 dj 

17, (b) yit) = yne" 0 - 00012096 *' (c) 2808 years (d) 6,9237 seconds 

Section 9.2 Euler’s Method 


0 

>•*(* = 0.1) y k (h = 0.2] 

o.o: 

1 

1 

0 1 , 

0.90000 


0.2 ! 

0.81100 

0.80000 

03 1 

0.73390 


0.4 1 

0.66951 

0,64800 
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h yk ( I6y/i - m)/ 15 

I 1.6701860 

1/2 1.6694308 1.6693805 

1/4 1.6693928 1.6693903 

1/8 1.6693906 1.6693905 
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Section 9.5 Riinge-Kutta Methods 


tk 

£ 

II 

0 

y k = ( h = 0.2) 

0 

1 

1 

0.1 

0.90516 


0.2 

0.82127 

0.82127 

0.3 

0.74918 


0.4 

0.68968 

0.68969 


h 

y*(A=0.1) 

d 

II 

II 

£ 

0 

1 

1 

0.1 

0.99501 


0.2 

0,98020 

0.98020 

0.3 

0.95600 


0.4 

0.92312 

0.92312 


Section 9.6 Predictor-Corrector Methods 

1. y 4 = 0.82126825, >5 = 0.78369923 

3. >4 = 0.74832050, y 5 = 0.66139979 

4. v 4 = 0.98247692, y 5 = 0.97350099 
7. y 4 = 1.1542232, y 5 = 1.2225213 

Section 9.7 Systems of Differential Equations 

1. (a) (xi, yi) = (—2.5500000, 2.6700000) 

(x 2 , yi) - (-2.4040735, 2.5485015) 

(b) Ui, yi) = (-2.5521092, 2.6742492) 

5. (b) *' = y 

/ = l.5x + 2.5y + 22.5e 2r 

(c) xi = 2.05, x 2 = 2.17 

(d) xi = 2,0875384 

Section 9.8 Boundary Value Problems 

2. No; q(t) = ~\/t 2 < 0 for all t e [0.5, 4.5]. 

Section 9.9 Finite-difference Method 

1. (a) hi =0.5, xi = 7.2857149 

h 2 = 0.25, xi = 6.0771913, x 2 = 7.2827443 


2. (a) 'm = 0,5, X] =0.85414295 

h 2 = 0.25, X! = 0.93524622, x 2 = 0.83762911 


Section 10.11 Hyperbolic Equations 



*2 

x 3 

x 4 

*5 

0.0 

0.587785 

0.951057 

0.951057 

0.587785 

0,1 

0.475528 

0.7159421 

0.769421 

0.475528 

0.2 

0.181636 

0.293893 

0.293893 

0.181636 


*i 

*2 

*3 

x 4 

*5 

0.0 

0.500 

1.000 

1.500 

0.750 

0.1 

0,500 

1.000 

0.875 

0.800 

0.2 

0,500 

0,375 

0-300 

0.125 


Section 10.2 Parabolic Equations 
3_,_ r _, 


xi = 0.0 

xz = 0.2 

= 0.4 

x 4 = 0.6 

X5 =0,8 

x<5 = 1,0 

00 

0.587785 

0.951057 

0-951057 

0.587785 

0.0 

0.0 

0.475528 

0.769421 

0.769421 

0.475528 i 

0.0 

0.0 

0.384710 

0.622475 

0.622475 

0.384710 1 

0.0 


Section 103 Elliptic Equations 

1, (a) -4pi+ P 2 + P3 — -80 
Pi *Fl + p4 = 10 

P\ “4/73+ p4=“160 

P2+ P3-4p 4 = -90 

(b) pi = 41.25, p 2 = 23.75, p 3 = 61.25, p A = 43.75 

5. (a) i/j* 4- = 2a + 2c = 0, if a = —c 

6. Determine if u (x, y) = cos(2x) + sin(2y) is a solution, since it is also defined on 
the interior of /?; that is, u xx + w vv = —4cos(2x) — 4 sin(2y) = —4(cos(2x) H- 
sin(2y)) = —4u. 


Section 11.1 Homogeneous Systems: The Eigenvalue Problem 

1. (a) \A — kl\ = k 2 - 3k - 4 = 0 implies that ki =: —1 and k 2 = 4. Substituting 
each eigenvalue into \A — A/| =0 and solving gives V 1 = [—1 l]^ and V 2 == 
[2/3 1]\ respectively. 
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US). If A = 2 is an eigenvalue of A corresponding to the vector V, then AV = 2V. 
Premultiply both sides by A~ l : A~ l AV = A~ l {2V) or V = 2A~ l V. Thus 
A l V= W. 


Section 11.2 Power Method 


1. (A-ctl)V = AV-otIV = AV-ctV == XV-ctV = (A - a) V. Thus (A - a), 
V is an eigenpairof A — a I. 

J—0.2 0.31 


5. (a) |A - 1/j = 


0.2 -0.3 


= 0 


(b) 


thus -0.2*-K>.3y = 0. 


-0.2 0.3 01. . . P—0.2 

t 0.2 -0.3 0 J ,seC|u to |. 0 

Let y = t, then x = 3/2. Thus the eigenvectors associated with A = 1 are 
fr[3/2 l]':/e ^0). 


0.3 01 

0 oj* 


(c) The eigenvector from part (b) implies that in the long run the 50,000 members 
of the population will be divided 3 to 2 in their preference for brands X and Y, 
respectively ; that is, [30,000 20,000]'. 


Section 11.3 Jacobi’s Method 

3. (a) The eigenpairs of A = ^ are 5, [2 l]', and -2 t [-1/3 l]. Thus the 

general solution is X(t) = c\e 5t [l l]' + c 2 e _2f [-l/3 l]'. Set t = 0 to solve 
for c] and that is, [l 2]' = c\[2 l]' + c 2 [-l/3 l]'. Thus c\ = 0.7143 and 
c 2 = 1.2857. 


Sec tion 11.4 Eigenvalues of Symmetric Matrices 

1. From (3) we have W = ~y ^ 2 and, from Figure 11.4, Z — j(X + Y), 
Taking the dot product, 

X-Y 1 (X-Y)-(X + Y) 

\\x -Y \\2 2 K + } 2 \\x-y\\ 2 

x x+xr-r x-r r 

211 x-rtk nx-Yh ' 

since X and Y have the same norm. 

2, p =(i- ixxy = r - uxxy = i - 2(xyx f = / - ixx 1 = p 


Index 


A 

Accelerating Convergence 

Aitken’s process, 90, 99 (#10—#14) 
Newton-Raphson, 71, 82, 88 (#23), 
176 

Steffensen’s method, 90,95 
Adam-Bashforth-Moulton method, 474, 
482 

Adaptive Quadrature, 382, 387 
Aitken’s process, 90,99 (#10-#14) 
Approximate significant digits, 25 
Approximation of data 

least-squares curves, 211, 257 
least-squares line, 255, 258 
least-squares polynomial, 271, 274 
Approximation of functions 

Chebyshev polynomial, 230, 233, 
238, 240 

Lagrange polynomial, 207, 211. 213, 
217, 238 

least squares, 255, 257, 271 
Newton polynomial, 220, 224, 227 
Pade approximation, 243, 246 
rational functions, 243 
splines, 280, 281,285, 293 
Taylor i»lymomials, 8, 26, 31, 189 


Augmented matrix, 126, 129 

B 

Back substitution, 121, 123, 136 
Backward difference, 334 
Basis, 557 

Binary numbers, 13, 17, 19 
Binomial series, 197 (#14) 

Bisection method, 53, 54, 59 
Bolzano’s method, 53 
Boole’s rule, 344, 372, 375, 380 (#3, #4), 
389 (#3) 

Boundary value problems, 497, 503, 50.5, 
' 510 

Bracketing methods, 51, 53 

C 

Central difference, 313, 314, 329, 340 (#7, 
# 8 ) 

Characteristic polynomial, 559 
Chebyshev nodes, 232 
Chebyshev polynomial 

interpolation, 230, 233, 238, 240 
minimization, 233 
nodes, 234 
Chopped number, 27 


Nate: Numbers in parentheses refer to problem numbers in exercises. 
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C"kH3 

Composite Simplon’s rule, 350, 354, 359, 
363 

Composite trapezoidal rule, 350, 354, 358, 
363 

Computer accuracy, 21 
Continuous function, 3 
Convergence 

acceleration, 82, 87 (#21-#23), 90, 
92,95 

criteria, 62, 66 
global (local), 62 
linear, 76, 77,90 

Newton-Raphson, 77, 82, 87 (#21, 
#23) 

order of, 32, 75 

quadratic, 76, 77, 82, 87 (#21, #23) 
sequence, 3 

series, 8, 99 (#I0-#14) 
speed, 75 

Corrector formula, 475, 477 
CrankNicholson method, 531, 535 
Cube-root algorithm, 86 (#11) 

Cubic spline 

clamped, 284, 285, 293 
natural, 284, 285 

D 

D'Alembert’s solution, 519 
Deflation of eigenvalues, 578 
Derivative 

definition, 5, 311 

formulas, 204, 313, 322, 329, 333, 
505,517,527.538 
higher, 329, 333, 505 
partial, 325 (#7), 517, 527, 538 
polynomials, 204, 334, 336 
Determinant, 113, 114.123, 151 
Difference 

backward, 334 

central, 313, 314, 329, 340 (#7, #8) 
divided, 223 

finite-difference method, 505 , 510, 
514,517, 527, 539 
forward, 334, 341 (#13) 
table, 224 

Difference equation, 505, 517, 527, 531, 
539 

Differential equation 


Adams-Eashforth-Moulton method, 

474, 482 

boundary value problems, 497, 503, 
505, 510 

Crank-Nicholson method, 531,535 
Dirichlet method for Laplace’s equa¬ 
tion, 549 

Euler’s method, 433, 437, 440 
existence-uniqueness, 430 
finite-difference method, 505, 510, 
514, 517,527, 539 
forward-difference method, 528, 533 
Hamming’s method, 484 
Heun's method, 443, 445, 448, 465 
higher-order equations, 490 
initial value problem, 428, 430, 487, 
498 

Milne-Simpson method, 477,483 
modified Euler method, 465 
partial differential equations, 514, 
516,526, 538 
predictor, 474, 477 

Runge-Kutta method, 458, 461, 466, 
468,488, 502 

Runge-Kutta-Fehlberg method, 466, 
469 

shooting method, 498, 503 
stability of solutions, 478,481 
Taylor methods, 451, 452,455 

Digit 

binary, 14, 17, 19 
decimal, 14, 19, 22 

Dirichlet method for Laplace’s equation, 
549 

Distance between points, 103, 162 
Divided differences, 223 
Division 

by zero, 74,77 
synthetic, 10, 200 
Dot product, 103 
Double precision, 22 
Double root, 75., 77, 87 (#21) 

E 

Eigenvalues 

characteristic polynomial, 559 
definition, 559 
dominant, 568 
Householder’s method, 594 


inverse power method, 573, 575, 576 
Jacobi’s method, 581 
power method, 568, 570,573, 576 
QR method, 601,606 
Eigenvectors 

definition, 559 
dominant, 568 

Elementary row operations, 126 
Elementary transformations, 125 
Elliptic equations, 538 
Endpoint constraints for splines, 284 
Epidemic model, 442 (#9) 

Equivalent linear systems, 125 
Error 

absolute, 24 
bound, 189, 194,213 
computer, 21, 27, 135 
data, 36,203, 316 

differential equations, 437, 445, 452, 
462, 475, 477, 519 
differentiation, 313, 314, 316, 318 
integration, 344, 358, 359, 377 
interpolating polynomial, 189, 213, 
233 

loss of significance, 28 
propagation, 32 
relative, 24, 66 
root main square, 253 
round-off, 27 
sequence, 3 
stable (unstable), 33 
subtractive cancellation, 28 
truncation, 26, 313, 314 
Euclidean norm, 103, 162, 163 
Euler formuhis, 299 
Euler’s method, 433,437,440 
global error, 437 
modified, 465 
systems, 488 
Even function, 300 
Exponential fit, 263 
Extrapolated value, 199 
Extrema, 400, 404 
Extreme Value Theorem, 4 

F 

False position method, 56, 60 
Final global error, 437, 445, 452, 462 


Finite difference method!, 505, 510, 514, 
517, 527, 539 

Fixed-point iteration, 42, 49, 173 
error bound, 46 
Floating-point number, 21, 22 
accuracy, 21 

Forward difference, 334, 341 (#13) 
Forward difference method, 527, 528,533 
Forward substitution, 125 (#2) 

Fourier series, 299 
discrete, 304 
Fractions, binary, 17 
Fundamental theorem of calculus, 6 

G 

Gauss-Legendre integration, 389, 392, 394 
Gauss-Seidel iteration, 159, 161, 164 
Gaussian elimination, 125, 128, 143, 150 
back substitution, 121, 123 
computational complexity, 147 
LU factorization, HI, 143, 150 
multipliers, 127, 129 
pivoting, 127, 131 

tridiagonal systems, 140 (#1), 166 
(#3), 284, 506, 599 

Generalized Rolle’s theorem, 6,198 (#20) 
Geometric series, 16,51 
Gerschgcrin’s circle theorem, 566 
Golden ratio search, 401, 412 
Gradient, 412, 420 
Graphical analysis 

fixed-point iteration, 47 
Newton’s method, 70, 78, 79 
secant method, 80 

H 

Halley’s method, 87 (#22) 

Hamming’s method, 484 
Heat equation, 515 
Helmholtz’s equation, 533, 548 
Heun’s method, 443, 445, 448, 465 
Higher derivatives, 329, 333 
Hilbert matrix, 139 (#15) 

Hooke’s law, 262 (#1) 

Homer’s method, 10,200 
Householder’s method, 594 
Hyperbolic equations, 516 

I 

Ill-conditioning 
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least-squares data fitting, 134 
matrices, 133,139 (#15) 

Initial value problem, 428,430,487, 498 
Integration 

adaptive quadrature, 382, 387 
Boole’s rule, 344, 372, 375, 380 (#3, 
#4), 389 (#3) 

composite rales, 350, 354, 358, 363 
cubic spiines, 296 (#12) 

Gauss-Legendre integration, 389, 
392, 394 

midpoint rule, 366 (#12), 381 (#11) 
Newton-Cotes, 344 
Romberg integration, 373, 375, 377, 
378, 38L (#11) 

Simpson’s rule, 344, 353 (#9), 354, 
359, 363,370, 380 (#6), 387 
Trapezoidal rule, 344, 354, 358, 363, 
368, 377 

Intermediate Value Theorem, 3 
Interpolation 

Chebyshev polynomials, 230, 233, 
238, 240 

cubic splines, 281, 285-287, 293 
error, polynomials, 8, 31, 189, 211, 
213, 238 

extrapolation, 199 
integration, 296 (#12), 344 
Lagrange polynomials, 207,211,213, 
217, 238 

least squares, 255, 271 
linear, 207,219 (#12), 255,277 (#17), 
280 

Newton polynomials, 220,224, 227 
Pad£ approximations, 243, 246 
piecewise linear, 280 
polynomial wiggle, 273 
rational functions, 243 
Kunge phenomenon, 236 
Taylor polynomials, 8, 26, 31, 189, 
313, 329 

trigonometric polynomials, 297, 303, 
306 

Iteration methods 

bisection, 53, 54, 59 
fixed point, 42, 49, 173, 544 
Gauss-Seidel, 159, 161, 164 
Jacobi iteration, 156, 161, 163 
Muller. 92, 97 


Newton, 70, 82, 84, 88 (#23), 176, 
179 

partial differential equations, 546 
regula falsi, 56, 60 
secant, 80, 84, 87 (#20) 

Steffensen, 92, 95 

J 

Jacobi iteration for linear systems, 156, 
161, 163 

Jacobi’s method for eigenvalues, 581, 590 
Jacobian matrix, 170, 176 

L 

Lagran ge polynomials, 207, 211, 213, 236 
Laplace’s equation, 538, 549 
Least-squares data fitting 
data linearization, 266 
linear fit, 255, 258, 260 (#7), 277 
(#17) 

nonlinear fit, 257,266, 271 
plane, 277 (#17, #18) 
polynomial fit, 271,274 
root-mean-square error, 253 
trigonometric polynomials, 297, 303, 
306 

Length of a cun/e, 364 (#2) 

Length of a vector, 103,162, 163 
Limit 

function, 2 
sequence, 3 
series, 8 

Linear approximation, 219 (#12), 255,258, 
277 (#17), 280 
Linear combination, 103,499 
Linear convergence, 76, 77,90 
Linear independence, 557 
Linear least-squares fit, 255, 258, 260 (#7), 
277 (#17) 

Linear system, 114, 121, 128, 143, 152, 
156, 163 

Linear systems of equations 

back substitution, 121, 123,136 
forward substitution, 125 (#2) 
Gaussian elimination, 125, 128, 143, 
150 

LU factorization, 141, 143, 150 
tridiagonal systems, 140 (#1), 166 
(#3), 284, 506, 599 


Linear systems, theory 

matrix form, 111, 114, 127,141 
nonsingular, 114 
Lipschitz condition, 430 
Location of roots, 68 

Logistic rule of population growth, 276 
(#6, #7) 

Loss of significance, 28 
Lower triangular determinant, 123 
LU factorization, 141,143, 150 

M 

Machine numbers, 20 
Maclaurin series, 243 
Mantissa, 20,22 
Markov process, 579 (#5) 

Matrix 

addition, 107 
augmented, 126, 129 
determinant, 113, 114, 123, 151 
diagonalization, 563 
eigenvalue, 559 
eigenvector, 559 
equality, 106 
Hilbert, 139 (#15) 
identity, 112 

ill-conditioned, 133, 139 (#15) 
inverse, 112,114 

lower trurngular, 120, 125 (#2), 143 
LU factorization, 141, 143, 150 
multiplication, 110, 112, 143, 150 
nonsingular, 112 
norm, 566 

orthogonal, 565,594 
permutation, 148, 150 
singular, 113 

strictly idiagonally dominant, 160, 
162, 163 

symmetric, 109 (#6), 565, 581, 590, 
594 

transpose, i04, 108 (#5), 270 
triangular, 120, 125 (#2) 
tridiagonal, 140 (#1), 166 (#3), 284, 
506,599 

Mean of data, 260 (#4, #5, #6) 

Mean value theorems 
derivative, 5, 45 
integrals, 6 
intermediate, 3 


weighted integral, 7 
Midpoint rale, 366 (#12), 381 (#11) 
Milne-Simpson method, 477, 483 
Minimax approximation, Chebyshev, 231, 
233, 238 

Minimum 

golden ratio search, 401, 412 
gradient method, 412, 420 
Nelder-Mead, 405, 414 
Modified Eiuler method, 465 
Muller’s method, 92, 97 
Multiple root, 75, 82, 87 (#21, #23) 
Multistep methods 

Adams-Bashforth-Moulton method, 
474,482 

Hamming’s method, 484 
Milne-Simpson method, 477,483 

N 

Natural cubic splines, 284, 285 
Near-minimax approximation, 231, 233. 
238 

Neider-Mead, 405, 414 
Nested Multiplication, 10, 221 
Neumann boundary conditions, 541, 545 
Newton divided differences, 223 
Newton polynomial, 220, 224, 227 
Newton systems, 176, 179 
Newton’s method 

multiple roots, 75, 82, 87 (#21, #23) 
order of convergence, 77 
Newton-Cotes formulas, 344 
Newton-Raphson formula, 82, 84, 88 
(#23), 176, 179 

Nodes, 203, 207, 211, 213, 234. 344, 389 
Norm 

Euclidean, 103, 162, 163 
matrix, 566 
Normal equations, 255 
Numerical differentiation, 313, 314, 320, 
329, 333 

backward differences, 334 
central differences, 313, 314, 329, 
340 (#7, #8) 

error formula, 313, 314, 316, 318 
forward differences, 334, 341 (#13) 
higher derivatives, 329, 333 
Richardson extrapolation, 320 
Numerical integration 
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adaptive quadrature, 382, 387 
Boole’s rule, 344, 372, 375, 380 (#3, 
#4), 389 (#3) 

composite rules, 350, 354, 358, 363 
cubic splines, 296 (#12) 
Gauss-Legendre integration, 389, 
392, 394 

midpoint rule, 366 (#12), 381 (#11) 
Newton-Cotes, 344 
Romberg integration, 373, 375, 377, 
378, 381 (#11) 

Simpson’s rule:, 344, 353 (#9), 354, 
359, 363, 370,380 (#6), 387 
Trapezoidal rule, 344, 354, 358, 363, 
368, 377 

O 

0{h n ), 29, 32, 214, 313, 314, 329, 333, 
373, 377, 437, 445, 452, 461, 
475,477, 505, 517, 527,538 
Odd function, 301 
Optimization 

golden ratio search, 401, 412 
gradient method. 412, 420 
Nelder-Mead, 405, 414 
Optimum step size 

differential equations, 466, 476, 479 
differentiation. 316, 317 
integration, 358, 382 
interpolation, 213, 234 
Order 

of approximation, 29, 32, 214, 313, 
329,333, 373, 377 
of convergence, 32, 75 
Orthogonal polynomials, Chebyshev, 238 

P 

Pad£ approximation, 243, 246 
Parabolic equation, 526 
Partial derivative, 517, 527, 538 
Partial differential equations, 514, 516, 
526, 538 

elliptic equations, 538 
hyperbolic equations, 516 
parabolic equations, 526 
Partial pivoting, 133 
Periodic function, 298 
Piecewise 

continuous, 298 


INDEX 


cubic, 281 
linear, 280 
Pivoting 

element, 127 
row, 127 

strategies, 131, 133 
Plane rotations, 115, 581 
Poisson’s equation, 538, 548 
Polynomials 

calculus, 204 
characteristic, 559 
Chebyshev, 230, 233, 238, 240 
derivative, 204, 334, 336 
interpolation, 204, 207, 210, 211, 
217, 224, 227, 238 
Lagrange, 207,211,213,236 
Newton, 22:0, 224, 227 
Taylor, 8, 26,31, 189,313,329 
trigonometric, 297, 303, 306 
wiggle, 273 

Power method, 568, 570, 573, 576 
Predator-prey model, 495 (#13) 
Predictor-corrector method, 474 
Projectile motion, 73, 442 (#8), 450 (#6) 
Propagation of error, 32 

Q 

QR method, 606 

Quadratic convergence, 76, 77, 82, 87 
(#21, #23) 

Quadratic formula, 39 (#12) 

Quadrature 

adaptive quadrature, 382, 387 
Boole’s rule, 344, 372, 375, 380 (#3, 
#4), 389 (#3) 

composite rules, 350, 354, 358, 363 
cubic splines, 2% (#12) 
Gauss-Legendre integration, 389, 
392, 394 

midpoint rule, 366 (#12), 381 (#11) 
Newton-Cotes, 344 
Romberg integration, 373, 375, 377, 
378, 381 (#11) 

Simpson’s rule, 344, 353 (#9), 354, 
359,363, 370, 380 (#6), 387 
Trapezoidal rule, 344, 354, 358, 363, 
368, 377 


R 

Radioactive decay, 432 (#17) 

Rational function, 243 
Regula falsi method, 56,60 
Relative error, 24, 66 
Residual, 167 (#5), 253 
Richardson 

differential equations, 449 (#7), 456 
(#6), 471 (#7) 

numerical differentiation, 320, 322 
numerical integration, 375 
Rolle's theorem, 5, 6, 198 (#20), 212, 219 
(#13) 

Romberg integration, 373, 375, 377, 378, 
381 (#11) 

Root 

location, 68 

multiple, 75, 82, 87 (#21, #23) 
of equation, 53, 75 
simple, 75, 77, 87 (#22) 
synthetic division, 10, 200 
Root finding 

bisection, 53, 54, 59 
Muller, 92, 97 

multiple mots, 75, 82, 87 (#21, #23) 
Newton, 82, 84, 88 (#23), 176, 179 
quadratic function, 39 (#12) 
regula falsi, 56, 60 
secant, 80. 84, 87 (#20) 

Steffensen, 92, 95 
Root-mean-square error, 253 
Rotation, 115, 581 
Rounding error. 27 

differentiation, 313, 314, 316, 318 
floating point number, 21 
Row operations, 127 
Runge phenomenon, 236 
Runge-Kutta methods, 458, 461,466, 468, 
488, 502 

Fehlberg method, 466,469 
Richardson extrapolation, 471 (#7) 
systems, 488 

S 

Scaled partial pivoting, 133 
Schur, 563 

Scientific notation, L9 
Secant method, 80, 84, 87 (#20) 

Seidel iteration, 174, 179 
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Sequence, 3, 41 
convergent, 3 
error, 3 

geometric, 16, 51 
Sequential integration 
Boole, 372, 375 
Simpson. 370,375 
trapezoidal, 369, 375, 377 
Series 

binomial, 196 (#10) 
convergence, 8, 99 (#10--#14), 189, 
194 

geometric, 16, 51 

Maclaurin, 243 

Taylor, 8, 26, 31, 189, 313, 329 
Shooting method, 498, 503 
Significant digits, 25 
Similarity transformation, 582 
Simple root, 75, 77, 87 (#22) 

Simpson’s rule, 344, 353 (#9), 354, 359, 
363, 370, 387 

three-eighths rule, 344, 353 (#9), 380 

(# 6 ) 

Single precision, 22 
Single-step methods, 474 
Slope methods, 70, 80, 84 
SOR method, 545 
Spectral radius theorem, 566 
Splines 

clamped, 284, 285, 29.3 
end constraints, 284 
integrating, 296 (#12) 
linear, 280 
natural, 284, 285 
Square-root algorithm, 72 
Stability of differential equations, 478, 481 
Steepest descent, 412, 420 
Steffensen’s method, 92, 95 
Step size 

differential equations, 466, 476, 479 
differentiation, 316, 318 
integration, 358, 382 
interpolation, 213, 234 
Stopping criteria, 58, 62 (#13) 

Successive over-relaxation, 545 
Surface area, 364 (#3) 

Synthetic di vision, 10, 200 
Systems 

differential, 487 
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linear, 114, 121, 123, 128, 136, 143, 
150,156, 163 
nonlinear, 167, 174 

T 

Taylor series, 8, 26„ 31, 189, 313, 329 
Taylor's method, 451, 452,455 
Termination criterion 

bisection method, 58 
Newton’s method, 84 
regula falsi method, 58, 60 
Romberg integration, 378 
Runge-Kutta method, 469 
secant method, 84 
Transformation, elementary, 125 
Trapezoidal rule, 344, 354, 358, 363, 369, 
377 

Trisingular factorization, 141, 143, 149 
Trigonometric polynomials, 297, 303, 306 
Truncation error, 26, 313, 314 


U 

Unimodal function, 402 
Unstable error, 33 
Upper-triangularization, 136, 150 

V 

Vectors 

dot product, 103 
Euclidean norm, 103, 162, 163 

W 

Wave equation, 516, 519 

Weights, for integration rules, 344, 393 

Wiggle, 273 

Z 

Zeros 

of Chebyshev polynomials, 232 

of functions, 53, 75 

root finding, 40, 51, 70, 90, 167, 174 



